Message ID | cae511f90085701e7093ce39dc8dabf8fc16b844.1522168131.git-series.kieran.bingham@ideasonboard.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Kieran, I've got a question: On Tue, 27 Mar 2018, Kieran Bingham wrote: > Newer high definition cameras, and cameras with multiple lenses such as > the range of stereo-vision cameras now available have ever increasing > data rates. > > The inclusion of a variable length packet header in URB packets mean > that we must memcpy the frame data out to our destination 'manually'. > This can result in data rates of up to 2 gigabits per second being > processed. > > To improve efficiency, and maximise throughput, handle the URB decode > processing through a work queue to move it from interrupt context, and > allow multiple processors to work on URBs in parallel. > > Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com> > > --- > v2: > - Lock full critical section of usb_submit_urb() > > v3: > - Fix race on submitting uvc_video_decode_data_work() to work queue. > - Rename uvc_decode_op -> uvc_copy_op (Generic to encode/decode) > - Rename decodes -> copy_operations > - Don't queue work if there is no async task > - obtain copy op structure directly in uvc_video_decode_data() > - uvc_video_decode_data_work() -> uvc_video_copy_data_work() > > v4: > - Provide for_each_uvc_urb() > - Simplify fix for shutdown race to flush queue before freeing URBs > - Rebase to v4.16-rc4 (linux-media/master) adjusting for metadata > conflicts. > > drivers/media/usb/uvc/uvc_video.c | 107 ++++++++++++++++++++++++------- > drivers/media/usb/uvc/uvcvideo.h | 28 ++++++++- > 2 files changed, 111 insertions(+), 24 deletions(-) > > diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c > index 7dd0dcb457f3..a62e8caf367c 100644 > --- a/drivers/media/usb/uvc/uvc_video.c > +++ b/drivers/media/usb/uvc/uvc_video.c > @@ -1042,21 +1042,54 @@ static int uvc_video_decode_start(struct uvc_streaming *stream, > return data[0]; > } > > -static void uvc_video_decode_data(struct uvc_streaming *stream, > +/* > + * uvc_video_decode_data_work: Asynchronous memcpy processing > + * > + * Perform memcpy tasks in process context, with completion handlers > + * to return the URB, and buffer handles. > + */ > +static void uvc_video_copy_data_work(struct work_struct *work) > +{ > + struct uvc_urb *uvc_urb = container_of(work, struct uvc_urb, work); > + unsigned int i; > + int ret; > + > + for (i = 0; i < uvc_urb->async_operations; i++) { > + struct uvc_copy_op *op = &uvc_urb->copy_operations[i]; > + > + memcpy(op->dst, op->src, op->len); > + > + /* Release reference taken on this buffer */ > + uvc_queue_buffer_release(op->buf); > + } > + > + ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC); Does this still have to be ATOMIC now that it's called from a work queue context? > + if (ret < 0) > + uvc_printk(KERN_ERR, "Failed to resubmit video URB (%d).\n", > + ret); > +} [snip] Thannks Guennadi
Hi Guennadi, Thank you for reviewing / taking a look through the series. On 04/06/18 13:09, Guennadi Liakhovetski wrote: > Hi Kieran, > > I've got a question: > > On Tue, 27 Mar 2018, Kieran Bingham wrote: > >> Newer high definition cameras, and cameras with multiple lenses such as >> the range of stereo-vision cameras now available have ever increasing >> data rates. >> >> The inclusion of a variable length packet header in URB packets mean >> that we must memcpy the frame data out to our destination 'manually'. >> This can result in data rates of up to 2 gigabits per second being >> processed. >> >> To improve efficiency, and maximise throughput, handle the URB decode >> processing through a work queue to move it from interrupt context, and >> allow multiple processors to work on URBs in parallel. >> >> Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com> >> >> --- >> v2: >> - Lock full critical section of usb_submit_urb() >> >> v3: >> - Fix race on submitting uvc_video_decode_data_work() to work queue. >> - Rename uvc_decode_op -> uvc_copy_op (Generic to encode/decode) >> - Rename decodes -> copy_operations >> - Don't queue work if there is no async task >> - obtain copy op structure directly in uvc_video_decode_data() >> - uvc_video_decode_data_work() -> uvc_video_copy_data_work() >> >> v4: >> - Provide for_each_uvc_urb() >> - Simplify fix for shutdown race to flush queue before freeing URBs >> - Rebase to v4.16-rc4 (linux-media/master) adjusting for metadata >> conflicts. >> >> drivers/media/usb/uvc/uvc_video.c | 107 ++++++++++++++++++++++++------- >> drivers/media/usb/uvc/uvcvideo.h | 28 ++++++++- >> 2 files changed, 111 insertions(+), 24 deletions(-) >> >> diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c >> index 7dd0dcb457f3..a62e8caf367c 100644 >> --- a/drivers/media/usb/uvc/uvc_video.c >> +++ b/drivers/media/usb/uvc/uvc_video.c >> @@ -1042,21 +1042,54 @@ static int uvc_video_decode_start(struct uvc_streaming *stream, >> return data[0]; >> } >> >> -static void uvc_video_decode_data(struct uvc_streaming *stream, >> +/* >> + * uvc_video_decode_data_work: Asynchronous memcpy processing >> + * >> + * Perform memcpy tasks in process context, with completion handlers >> + * to return the URB, and buffer handles. >> + */ >> +static void uvc_video_copy_data_work(struct work_struct *work) >> +{ >> + struct uvc_urb *uvc_urb = container_of(work, struct uvc_urb, work); >> + unsigned int i; >> + int ret; >> + >> + for (i = 0; i < uvc_urb->async_operations; i++) { >> + struct uvc_copy_op *op = &uvc_urb->copy_operations[i]; >> + >> + memcpy(op->dst, op->src, op->len); >> + >> + /* Release reference taken on this buffer */ >> + uvc_queue_buffer_release(op->buf); >> + } >> + >> + ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC); > > Does this still have to be ATOMIC now that it's called from a work queue > context? I think you're right. This could very likely be changed to GFP_KERNEL. Does this series impact anything on your async-controls series ? -- Kieran > >> + if (ret < 0) >> + uvc_printk(KERN_ERR, "Failed to resubmit video URB (%d).\n", >> + ret); >> +} > > [snip] > > Thannks > Guennadi >
Hi Kieran, Thank you for the patch. On Tuesday, 27 March 2018 19:46:03 EEST Kieran Bingham wrote: > Newer high definition cameras, and cameras with multiple lenses such as > the range of stereo-vision cameras now available have ever increasing > data rates. > > The inclusion of a variable length packet header in URB packets mean > that we must memcpy the frame data out to our destination 'manually'. > This can result in data rates of up to 2 gigabits per second being > processed. > > To improve efficiency, and maximise throughput, handle the URB decode > processing through a work queue to move it from interrupt context, and > allow multiple processors to work on URBs in parallel. > > Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com> > > --- > v2: > - Lock full critical section of usb_submit_urb() > > v3: > - Fix race on submitting uvc_video_decode_data_work() to work queue. > - Rename uvc_decode_op -> uvc_copy_op (Generic to encode/decode) > - Rename decodes -> copy_operations > - Don't queue work if there is no async task > - obtain copy op structure directly in uvc_video_decode_data() > - uvc_video_decode_data_work() -> uvc_video_copy_data_work() > > v4: > - Provide for_each_uvc_urb() > - Simplify fix for shutdown race to flush queue before freeing URBs Indeed, v4 looks simpler, I like it. > - Rebase to v4.16-rc4 (linux-media/master) adjusting for metadata > conflicts. > > drivers/media/usb/uvc/uvc_video.c | 107 ++++++++++++++++++++++++------- > drivers/media/usb/uvc/uvcvideo.h | 28 ++++++++- > 2 files changed, 111 insertions(+), 24 deletions(-) > > diff --git a/drivers/media/usb/uvc/uvc_video.c > b/drivers/media/usb/uvc/uvc_video.c index 7dd0dcb457f3..a62e8caf367c 100644 > --- a/drivers/media/usb/uvc/uvc_video.c > +++ b/drivers/media/usb/uvc/uvc_video.c > @@ -1042,21 +1042,54 @@ static int uvc_video_decode_start(struct > uvc_streaming *stream, return data[0]; > } > > -static void uvc_video_decode_data(struct uvc_streaming *stream, > +/* > + * uvc_video_decode_data_work: Asynchronous memcpy processing > + * > + * Perform memcpy tasks in process context, with completion handlers > + * to return the URB, and buffer handles. By "and buffer handles" do you mean you release the buffer reference ? I'd then write this as "Copy URB data to video buffers in process context, releasing buffer references and requeuing the URB when done." > + */ > +static void uvc_video_copy_data_work(struct work_struct *work) > +{ > + struct uvc_urb *uvc_urb = container_of(work, struct uvc_urb, work); > + unsigned int i; > + int ret; > + > + for (i = 0; i < uvc_urb->async_operations; i++) { > + struct uvc_copy_op *op = &uvc_urb->copy_operations[i]; > + > + memcpy(op->dst, op->src, op->len); > + > + /* Release reference taken on this buffer */ Missing period at the end of the sentence, here and in other locations in this patch. > + uvc_queue_buffer_release(op->buf); > + } > + > + ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC); > + if (ret < 0) > + uvc_printk(KERN_ERR, "Failed to resubmit video URB (%d).\n", > + ret); > +} > + > +static void uvc_video_decode_data(struct uvc_urb *uvc_urb, > struct uvc_buffer *buf, const u8 *data, int len) > { > - unsigned int maxlen, nbytes; > - void *mem; > + unsigned int active_op = uvc_urb->async_operations; > + struct uvc_copy_op *decode = &uvc_urb->copy_operations[active_op]; How about calling the variable "op" as it's a copy operation ? > + unsigned int maxlen; > > if (len <= 0) > return; > > - /* Copy the video data to the buffer. */ > maxlen = buf->length - buf->bytesused; > - mem = buf->mem + buf->bytesused; > - nbytes = min((unsigned int)len, maxlen); > - memcpy(mem, data, nbytes); > - buf->bytesused += nbytes; > + > + /* Take a buffer reference for async work */ > + kref_get(&buf->ref); > + > + decode->buf = buf; > + decode->src = data; > + decode->dst = buf->mem + buf->bytesused; > + decode->len = min_t(unsigned int, len, maxlen); > + > + buf->bytesused += decode->len; > > /* Complete the current frame if the buffer size was exceeded. */ > if (len > maxlen) { > @@ -1064,6 +1097,8 @@ static void uvc_video_decode_data(struct uvc_streaming > *stream, buf->error = 1; > buf->state = UVC_BUF_STATE_READY; > } > + > + uvc_urb->async_operations++; > } > > static void uvc_video_decode_end(struct uvc_streaming *stream, > @@ -1272,7 +1307,7 @@ static void uvc_video_decode_isoc(struct uvc_urb > *uvc_urb, uvc_video_decode_meta(stream, meta_buf, mem, ret); > > /* Decode the payload data. */ > - uvc_video_decode_data(stream, buf, mem + ret, > + uvc_video_decode_data(uvc_urb, buf, mem + ret, > urb->iso_frame_desc[i].actual_length - ret); > > /* Process the header again. */ > @@ -1334,9 +1369,9 @@ static void uvc_video_decode_bulk(struct uvc_urb > *uvc_urb, * sure buf is never dereferenced if NULL. > */ > > - /* Process video data. */ > + /* Prepare video data for processing. */ > if (!stream->bulk.skip_payload && buf != NULL) > - uvc_video_decode_data(stream, buf, mem, len); > + uvc_video_decode_data(uvc_urb, buf, mem, len); > > /* Detect the payload end by a URB smaller than the maximum size (or > * a payload size equal to the maximum) and process the header again. > @@ -1422,7 +1457,7 @@ static void uvc_video_complete(struct urb *urb) > uvc_printk(KERN_WARNING, "Non-zero status (%d) in video " > "completion handler.\n", urb->status); > /* fall through */ > - case -ENOENT: /* usb_kill_urb() called. */ > + case -ENOENT: /* usb_poison_urb() called. */ > if (stream->frozen) > return; > /* fall through */ > @@ -1436,6 +1471,9 @@ static void uvc_video_complete(struct urb *urb) > > buf = uvc_queue_get_current_buffer(queue); > > + /* Re-initialise the URB async work. */ > + uvc_urb->async_operations = 0; > + I'd move this just after the vb2_qmeta block to keep the buf and buf_meta code together. > if (vb2_qmeta) { > spin_lock_irqsave(&qmeta->irqlock, flags); > if (!list_empty(&qmeta->irqqueue)) > @@ -1444,12 +1482,24 @@ static void uvc_video_complete(struct urb *urb) > spin_unlock_irqrestore(&qmeta->irqlock, flags); > } > > + /* > + * Process the URB headers, and optionally queue expensive memcpy tasks > + * to be deferred to a work queue. > + */ > stream->decode(uvc_urb, buf, buf_meta); > > - if ((ret = usb_submit_urb(urb, GFP_ATOMIC)) < 0) { > - uvc_printk(KERN_ERR, "Failed to resubmit video URB (%d).\n", > - ret); > + /* If no async work is needed, resubmit the URB immediately. */ > + if (!uvc_urb->async_operations) { > + ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC); > + if (ret < 0) > + uvc_printk(KERN_ERR, > + "Failed to resubmit video URB (%d).\n", > + ret); > + return; > } > + > + INIT_WORK(&uvc_urb->work, uvc_video_copy_data_work); Do you need to reinit the work every time ? I thought that once after allocating it was enough. I could be wrong though. > + queue_work(stream->async_wq, &uvc_urb->work); > } > > /* > @@ -1544,25 +1594,29 @@ static int uvc_alloc_urb_buffers(struct > uvc_streaming *stream, */ > static void uvc_uninit_video(struct uvc_streaming *stream, int > free_buffers) { > - struct urb *urb; > - unsigned int i; > + struct uvc_urb *uvc_urb; > > uvc_video_stats_stop(stream); > > - for (i = 0; i < UVC_URBS; ++i) { > - struct uvc_urb *uvc_urb = &stream->uvc_urb[i]; > + /* > + * We must poison the URBs rather than kill them to ensure that even > + * after the completion handler returns, any asynchronous workqueues > + * will be prevented from resubmitting the URBs Missing period. > + */ > + for_each_uvc_urb(uvc_urb, stream) > + usb_poison_urb(uvc_urb->urb); > > - urb = uvc_urb->urb; > - if (urb == NULL) > - continue; > + flush_workqueue(stream->async_wq); > > - usb_kill_urb(urb); > - usb_free_urb(urb); > + for_each_uvc_urb(uvc_urb, stream) { > + usb_free_urb(uvc_urb->urb); > uvc_urb->urb = NULL; > } > > if (free_buffers) > uvc_free_urb_buffers(stream); > + > + destroy_workqueue(stream->async_wq); > } > > /* > @@ -1720,6 +1774,11 @@ static int uvc_init_video(struct uvc_streaming > *stream, gfp_t gfp_flags) > > uvc_video_stats_start(stream); > > + stream->async_wq = alloc_workqueue("uvcvideo", WQ_UNBOUND | WQ_HIGHPRI, > + 0); > + if (!stream->async_wq) > + return -ENOMEM; > + Is there a reason to allocate the workqueue at stream on time and destroy it at stream off time ? Does it consume resources if we keep it allocated without work queued ? > if (intf->num_altsetting > 1) { > struct usb_host_endpoint *best_ep = NULL; > unsigned int best_psize = UINT_MAX; > diff --git a/drivers/media/usb/uvc/uvcvideo.h > b/drivers/media/usb/uvc/uvcvideo.h index 112eed49bf50..27c230430eda 100644 > --- a/drivers/media/usb/uvc/uvcvideo.h > +++ b/drivers/media/usb/uvc/uvcvideo.h > @@ -485,12 +485,30 @@ struct uvc_stats_stream { > #define UVC_METATADA_BUF_SIZE 1024 > > /** > + * struct uvc_copy_op: Context structure to schedule asynchronous memcpy > + * > + * @buf: active buf object for this operation > + * @dst: copy destination address > + * @src: copy source address > + * @len: copy length > + */ > +struct uvc_copy_op { > + struct uvc_buffer *buf; > + void *dst; > + const __u8 *src; > + size_t len; > +}; > + > +/** > * struct uvc_urb - URB context management structure > * > * @urb: the URB described by this context structure > * @stream: UVC streaming context > * @buffer: memory storage for the URB > * @dma: DMA coherent addressing for the urb_buffer > + * @async_operations: counter to indicate the number of copy operations > + * @copy_operations: work descriptors for asynchronous copy operations > + * @work: work queue entry for asynchronous decode > */ > struct uvc_urb { > struct urb *urb; > @@ -498,6 +516,10 @@ struct uvc_urb { > > char *buffer; > dma_addr_t dma; > + > + unsigned int async_operations; > + struct uvc_copy_op copy_operations[UVC_MAX_PACKETS]; > + struct work_struct work; > }; > > struct uvc_streaming { > @@ -530,6 +552,7 @@ struct uvc_streaming { > /* Buffers queue. */ > unsigned int frozen : 1; > struct uvc_video_queue queue; > + struct workqueue_struct *async_wq; > void (*decode)(struct uvc_urb *uvc_urb, struct uvc_buffer *buf, > struct uvc_buffer *meta_buf); > > @@ -583,6 +606,11 @@ struct uvc_streaming { > } clock; > }; > > +#define for_each_uvc_urb(uvc_urb, uvc_streaming) \ > + for (uvc_urb = &uvc_streaming->uvc_urb[0]; \ > + uvc_urb < &uvc_streaming->uvc_urb[UVC_URBS]; \ > + ++uvc_urb) > + > struct uvc_device { > struct usb_device *udev; > struct usb_interface *intf;
Hi Kieran, On Wed, Mar 28, 2018 at 1:47 AM Kieran Bingham <kieran.bingham@ideasonboard.com> wrote: [snip] > @@ -1544,25 +1594,29 @@ static int uvc_alloc_urb_buffers(struct uvc_streaming *stream, > */ > static void uvc_uninit_video(struct uvc_streaming *stream, int free_buffers) > { > - struct urb *urb; > - unsigned int i; > + struct uvc_urb *uvc_urb; > > uvc_video_stats_stop(stream); > > - for (i = 0; i < UVC_URBS; ++i) { > - struct uvc_urb *uvc_urb = &stream->uvc_urb[i]; > + /* > + * We must poison the URBs rather than kill them to ensure that even > + * after the completion handler returns, any asynchronous workqueues > + * will be prevented from resubmitting the URBs > + */ > + for_each_uvc_urb(uvc_urb, stream) > + usb_poison_urb(uvc_urb->urb); > > - urb = uvc_urb->urb; > - if (urb == NULL) > - continue; > + flush_workqueue(stream->async_wq); > > - usb_kill_urb(urb); > - usb_free_urb(urb); > + for_each_uvc_urb(uvc_urb, stream) { > + usb_free_urb(uvc_urb->urb); > uvc_urb->urb = NULL; > } > > if (free_buffers) > uvc_free_urb_buffers(stream); > + > + destroy_workqueue(stream->async_wq); In our testing, this function ends up being called twice, if before suspend the camera is streaming and if the camera disconnects between suspend and resume. This is because uvc_video_suspend() calls this function (with free_buffers = 0), but uvc_video_resume() wouldn't call uvc_init_video() due to an earlier failure and uvc_v4l2_release() would end up calling this function again, while the workqueue is already destroyed. The following diff seems to take care of it: 8<~~~ diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c index c5e0ab564b1a..6fb890c8ba67 100644 --- a/drivers/media/usb/uvc/uvc_video.c +++ b/drivers/media/usb/uvc/uvc_video.c @@ -1493,10 +1493,11 @@ static void uvc_uninit_video(struct uvc_streaming *stream, int free_buffers) uvc_urb->urb = NULL; } - if (free_buffers) + if (free_buffers) { uvc_free_urb_buffers(stream); - - destroy_workqueue(stream->async_wq); + destroy_workqueue(stream->async_wq); + stream->async_wq = NULL; + } } /* @@ -1648,10 +1649,12 @@ static int uvc_init_video(struct uvc_streaming *stream, gfp_t gfp_flags) uvc_video_stats_start(stream); - stream->async_wq = alloc_workqueue("uvcvideo", WQ_UNBOUND | WQ_HIGHPRI, - 0); - if (!stream->async_wq) - return -ENOMEM; + if (!stream->async_wq) { + stream->async_wq = alloc_workqueue("uvcvideo", + WQ_UNBOUND | WQ_HIGHPRI, 0); + if (!stream->async_wq) + return -ENOMEM; + } if (intf->num_altsetting > 1) { struct usb_host_endpoint *best_ep = NULL; ~~~>8 Best regards, Tomasz
Hi Tomasz, On Tuesday, 7 August 2018 12:54:02 EEST Tomasz Figa wrote: > On Wed, Mar 28, 2018 at 1:47 AM Kieran Bingham wrote: > [snip] > > > @@ -1544,25 +1594,29 @@ static int uvc_alloc_urb_buffers(struct > > uvc_streaming *stream, > > */ > > > > static void uvc_uninit_video(struct uvc_streaming *stream, int > > free_buffers) > > { > > - struct urb *urb; > > - unsigned int i; > > + struct uvc_urb *uvc_urb; > > > > uvc_video_stats_stop(stream); > > > > - for (i = 0; i < UVC_URBS; ++i) { > > - struct uvc_urb *uvc_urb = &stream->uvc_urb[i]; > > + /* > > + * We must poison the URBs rather than kill them to ensure that > > even > > + * after the completion handler returns, any asynchronous > > workqueues > > + * will be prevented from resubmitting the URBs > > + */ > > + for_each_uvc_urb(uvc_urb, stream) > > + usb_poison_urb(uvc_urb->urb); > > > > - urb = uvc_urb->urb; > > - if (urb == NULL) > > - continue; > > + flush_workqueue(stream->async_wq); > > > > - usb_kill_urb(urb); > > - usb_free_urb(urb); > > + for_each_uvc_urb(uvc_urb, stream) { > > + usb_free_urb(uvc_urb->urb); > > uvc_urb->urb = NULL; > > } > > > > if (free_buffers) > > uvc_free_urb_buffers(stream); > > + > > + destroy_workqueue(stream->async_wq); > > In our testing, this function ends up being called twice, if before > suspend the camera is streaming and if the camera disconnects between > suspend and resume. This is because uvc_video_suspend() calls this > function (with free_buffers = 0), but uvc_video_resume() wouldn't call > uvc_init_video() due to an earlier failure and uvc_v4l2_release() > would end up calling this function again, while the workqueue is > already destroyed. This makes me wonder, as stated in my review, whether we shouldn't keep the work queue alive for the whole lifetime of the device instead of creating and destroying it at stream start and stop. > The following diff seems to take care of it: > > 8<~~~ > diff --git a/drivers/media/usb/uvc/uvc_video.c > b/drivers/media/usb/uvc/uvc_video.c > index c5e0ab564b1a..6fb890c8ba67 100644 > --- a/drivers/media/usb/uvc/uvc_video.c > +++ b/drivers/media/usb/uvc/uvc_video.c > @@ -1493,10 +1493,11 @@ static void uvc_uninit_video(struct > uvc_streaming *stream, int free_buffers) > uvc_urb->urb = NULL; > } > > - if (free_buffers) > + if (free_buffers) { > uvc_free_urb_buffers(stream); > - > - destroy_workqueue(stream->async_wq); > + destroy_workqueue(stream->async_wq); > + stream->async_wq = NULL; > + } > } > > /* > @@ -1648,10 +1649,12 @@ static int uvc_init_video(struct uvc_streaming > *stream, gfp_t gfp_flags) > > uvc_video_stats_start(stream); > > - stream->async_wq = alloc_workqueue("uvcvideo", WQ_UNBOUND | > WQ_HIGHPRI, - 0); > - if (!stream->async_wq) > - return -ENOMEM; > + if (!stream->async_wq) { > + stream->async_wq = alloc_workqueue("uvcvideo", > + WQ_UNBOUND | WQ_HIGHPRI, > 0); + if (!stream->async_wq) > + return -ENOMEM; > + } > > if (intf->num_altsetting > 1) { > struct usb_host_endpoint *best_ep = NULL; > ~~~>8
Hi Tomasz, On Tuesday, 7 August 2018 12:54:02 EEST Tomasz Figa wrote: > On Wed, Mar 28, 2018 at 1:47 AM Kieran Bingham wrote: [snip] > In our testing, this function ends up being called twice In your testing, has this patch series brought noticeable performance improvements ? Is there a particular reason you tested it, beside general support of UVC devices in ChromeOS ? [snip]
Hi Laurent, On Wed, Aug 8, 2018 at 8:12 AM Laurent Pinchart <laurent.pinchart@ideasonboard.com> wrote: > > Hi Tomasz, > > On Tuesday, 7 August 2018 12:54:02 EEST Tomasz Figa wrote: > > On Wed, Mar 28, 2018 at 1:47 AM Kieran Bingham wrote: > > [snip] > > > In our testing, this function ends up being called twice > > In your testing, has this patch series brought noticeable performance > improvements ? Is there a particular reason you tested it, beside general > support of UVC devices in ChromeOS ? Some of our older ARM devices have external USB ports wired to a low end dwc2 controller, which puts quite strict timing requirements on interrupt handling. For some cameras that produce bigger payloads (in our testing that was Logitech BRIO, running at 1080p), almost half of every frame would be dropped, due to the memcpy from uncached memory taking too much time. With this series, it goes down to bottom ~10% of only a part of the frames. With one more optimization from Keiichi [1], the problem disappears almost completely. [1] https://lore.kernel.org/patchwork/patch/956388/ Best regards, Tomasz
Hi Tomasz, On 07/08/18 10:54, Tomasz Figa wrote: > Hi Kieran, > > On Wed, Mar 28, 2018 at 1:47 AM Kieran Bingham > <kieran.bingham@ideasonboard.com> wrote: > [snip] >> @@ -1544,25 +1594,29 @@ static int uvc_alloc_urb_buffers(struct uvc_streaming *stream, >> */ >> static void uvc_uninit_video(struct uvc_streaming *stream, int free_buffers) >> { >> - struct urb *urb; >> - unsigned int i; >> + struct uvc_urb *uvc_urb; >> >> uvc_video_stats_stop(stream); >> >> - for (i = 0; i < UVC_URBS; ++i) { >> - struct uvc_urb *uvc_urb = &stream->uvc_urb[i]; >> + /* >> + * We must poison the URBs rather than kill them to ensure that even >> + * after the completion handler returns, any asynchronous workqueues >> + * will be prevented from resubmitting the URBs >> + */ >> + for_each_uvc_urb(uvc_urb, stream) >> + usb_poison_urb(uvc_urb->urb); >> >> - urb = uvc_urb->urb; >> - if (urb == NULL) >> - continue; >> + flush_workqueue(stream->async_wq); >> >> - usb_kill_urb(urb); >> - usb_free_urb(urb); >> + for_each_uvc_urb(uvc_urb, stream) { >> + usb_free_urb(uvc_urb->urb); >> uvc_urb->urb = NULL; >> } >> >> if (free_buffers) >> uvc_free_urb_buffers(stream); >> + >> + destroy_workqueue(stream->async_wq); > > In our testing, this function ends up being called twice, if before > suspend the camera is streaming and if the camera disconnects between > suspend and resume. This is because uvc_video_suspend() calls this > function (with free_buffers = 0), but uvc_video_resume() wouldn't call > uvc_init_video() due to an earlier failure and uvc_v4l2_release() > would end up calling this function again, while the workqueue is > already destroyed. > > The following diff seems to take care of it: Thank you for the investigation and patch report ;D I think moving the workqueue allocation might be a reasonable option as suggested by Laurent in his review. I'll look further into this when I get to another spin of the series. > > 8<~~~ > diff --git a/drivers/media/usb/uvc/uvc_video.c > b/drivers/media/usb/uvc/uvc_video.c > index c5e0ab564b1a..6fb890c8ba67 100644 > --- a/drivers/media/usb/uvc/uvc_video.c > +++ b/drivers/media/usb/uvc/uvc_video.c > @@ -1493,10 +1493,11 @@ static void uvc_uninit_video(struct > uvc_streaming *stream, int free_buffers) > uvc_urb->urb = NULL; > } > > - if (free_buffers) > + if (free_buffers) { > uvc_free_urb_buffers(stream); > - > - destroy_workqueue(stream->async_wq); > + destroy_workqueue(stream->async_wq); > + stream->async_wq = NULL; > + } > } > > /* > @@ -1648,10 +1649,12 @@ static int uvc_init_video(struct uvc_streaming > *stream, gfp_t gfp_flags) > > uvc_video_stats_start(stream); > > - stream->async_wq = alloc_workqueue("uvcvideo", WQ_UNBOUND | WQ_HIGHPRI, > - 0); > - if (!stream->async_wq) > - return -ENOMEM; > + if (!stream->async_wq) { > + stream->async_wq = alloc_workqueue("uvcvideo", > + WQ_UNBOUND | WQ_HIGHPRI, 0); > + if (!stream->async_wq) > + return -ENOMEM; > + } > > if (intf->num_altsetting > 1) { > struct usb_host_endpoint *best_ep = NULL; > ~~~>8 > > Best regards, > Tomasz >
Hi Tomasz, On 07/08/2018 10:54, Tomasz Figa wrote: > Hi Kieran, > > On Wed, Mar 28, 2018 at 1:47 AM Kieran Bingham > <kieran.bingham@ideasonboard.com> wrote: > [snip] >> @@ -1544,25 +1594,29 @@ static int uvc_alloc_urb_buffers(struct uvc_streaming *stream, >> */ >> static void uvc_uninit_video(struct uvc_streaming *stream, int free_buffers) >> { >> - struct urb *urb; >> - unsigned int i; >> + struct uvc_urb *uvc_urb; >> >> uvc_video_stats_stop(stream); >> >> - for (i = 0; i < UVC_URBS; ++i) { >> - struct uvc_urb *uvc_urb = &stream->uvc_urb[i]; >> + /* >> + * We must poison the URBs rather than kill them to ensure that even >> + * after the completion handler returns, any asynchronous workqueues >> + * will be prevented from resubmitting the URBs >> + */ >> + for_each_uvc_urb(uvc_urb, stream) >> + usb_poison_urb(uvc_urb->urb); >> >> - urb = uvc_urb->urb; >> - if (urb == NULL) >> - continue; >> + flush_workqueue(stream->async_wq); >> >> - usb_kill_urb(urb); >> - usb_free_urb(urb); >> + for_each_uvc_urb(uvc_urb, stream) { >> + usb_free_urb(uvc_urb->urb); >> uvc_urb->urb = NULL; >> } >> >> if (free_buffers) >> uvc_free_urb_buffers(stream); >> + >> + destroy_workqueue(stream->async_wq); > > In our testing, this function ends up being called twice, if before > suspend the camera is streaming and if the camera disconnects between > suspend and resume. This is because uvc_video_suspend() calls this > function (with free_buffers = 0), but uvc_video_resume() wouldn't call > uvc_init_video() due to an earlier failure and uvc_v4l2_release() > would end up calling this function again, while the workqueue is > already destroyed. > > The following diff seems to take care of it: Thank you for this. After discussing with Laurent, I have gone with the approach of keeping the workqueue for the lifetime of the stream, rather than the lifetime of the streamon. > > 8<~~~ > diff --git a/drivers/media/usb/uvc/uvc_video.c > b/drivers/media/usb/uvc/uvc_video.c > index c5e0ab564b1a..6fb890c8ba67 100644 > --- a/drivers/media/usb/uvc/uvc_video.c > +++ b/drivers/media/usb/uvc/uvc_video.c > @@ -1493,10 +1493,11 @@ static void uvc_uninit_video(struct > uvc_streaming *stream, int free_buffers) > uvc_urb->urb = NULL; > } > > - if (free_buffers) > + if (free_buffers) { > uvc_free_urb_buffers(stream); > - > - destroy_workqueue(stream->async_wq); > + destroy_workqueue(stream->async_wq); > + stream->async_wq = NULL; > + } > } > > /* > @@ -1648,10 +1649,12 @@ static int uvc_init_video(struct uvc_streaming > *stream, gfp_t gfp_flags) > > uvc_video_stats_start(stream); > > - stream->async_wq = alloc_workqueue("uvcvideo", WQ_UNBOUND | WQ_HIGHPRI, > - 0); > - if (!stream->async_wq) > - return -ENOMEM; > + if (!stream->async_wq) { > + stream->async_wq = alloc_workqueue("uvcvideo", > + WQ_UNBOUND | WQ_HIGHPRI, 0); > + if (!stream->async_wq) > + return -ENOMEM; > + } > > if (intf->num_altsetting > 1) { > struct usb_host_endpoint *best_ep = NULL; > ~~~>8 > > Best regards, > Tomasz >
Hi Kieran, On Wed, Nov 7, 2018 at 12:13 AM Kieran Bingham <kieran.bingham@ideasonboard.com> wrote: > > Hi Tomasz, > > On 07/08/2018 10:54, Tomasz Figa wrote: > > Hi Kieran, > > > > On Wed, Mar 28, 2018 at 1:47 AM Kieran Bingham > > <kieran.bingham@ideasonboard.com> wrote: > > [snip] > >> @@ -1544,25 +1594,29 @@ static int uvc_alloc_urb_buffers(struct uvc_streaming *stream, > >> */ > >> static void uvc_uninit_video(struct uvc_streaming *stream, int free_buffers) > >> { > >> - struct urb *urb; > >> - unsigned int i; > >> + struct uvc_urb *uvc_urb; > >> > >> uvc_video_stats_stop(stream); > >> > >> - for (i = 0; i < UVC_URBS; ++i) { > >> - struct uvc_urb *uvc_urb = &stream->uvc_urb[i]; > >> + /* > >> + * We must poison the URBs rather than kill them to ensure that even > >> + * after the completion handler returns, any asynchronous workqueues > >> + * will be prevented from resubmitting the URBs > >> + */ > >> + for_each_uvc_urb(uvc_urb, stream) > >> + usb_poison_urb(uvc_urb->urb); > >> > >> - urb = uvc_urb->urb; > >> - if (urb == NULL) > >> - continue; > >> + flush_workqueue(stream->async_wq); > >> > >> - usb_kill_urb(urb); > >> - usb_free_urb(urb); > >> + for_each_uvc_urb(uvc_urb, stream) { > >> + usb_free_urb(uvc_urb->urb); > >> uvc_urb->urb = NULL; > >> } > >> > >> if (free_buffers) > >> uvc_free_urb_buffers(stream); > >> + > >> + destroy_workqueue(stream->async_wq); > > > > In our testing, this function ends up being called twice, if before > > suspend the camera is streaming and if the camera disconnects between > > suspend and resume. This is because uvc_video_suspend() calls this > > function (with free_buffers = 0), but uvc_video_resume() wouldn't call > > uvc_init_video() due to an earlier failure and uvc_v4l2_release() > > would end up calling this function again, while the workqueue is > > already destroyed. > > > > The following diff seems to take care of it: > > Thank you for this. After discussing with Laurent, I have gone with the > approach of keeping the workqueue for the lifetime of the stream, rather > than the lifetime of the streamon. > Sounds good to me. Thanks for heads up! Best regards, Tomasz
diff --git a/drivers/media/usb/uvc/uvc_video.c b/drivers/media/usb/uvc/uvc_video.c index 7dd0dcb457f3..a62e8caf367c 100644 --- a/drivers/media/usb/uvc/uvc_video.c +++ b/drivers/media/usb/uvc/uvc_video.c @@ -1042,21 +1042,54 @@ static int uvc_video_decode_start(struct uvc_streaming *stream, return data[0]; } -static void uvc_video_decode_data(struct uvc_streaming *stream, +/* + * uvc_video_decode_data_work: Asynchronous memcpy processing + * + * Perform memcpy tasks in process context, with completion handlers + * to return the URB, and buffer handles. + */ +static void uvc_video_copy_data_work(struct work_struct *work) +{ + struct uvc_urb *uvc_urb = container_of(work, struct uvc_urb, work); + unsigned int i; + int ret; + + for (i = 0; i < uvc_urb->async_operations; i++) { + struct uvc_copy_op *op = &uvc_urb->copy_operations[i]; + + memcpy(op->dst, op->src, op->len); + + /* Release reference taken on this buffer */ + uvc_queue_buffer_release(op->buf); + } + + ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC); + if (ret < 0) + uvc_printk(KERN_ERR, "Failed to resubmit video URB (%d).\n", + ret); +} + +static void uvc_video_decode_data(struct uvc_urb *uvc_urb, struct uvc_buffer *buf, const u8 *data, int len) { - unsigned int maxlen, nbytes; - void *mem; + unsigned int active_op = uvc_urb->async_operations; + struct uvc_copy_op *decode = &uvc_urb->copy_operations[active_op]; + unsigned int maxlen; if (len <= 0) return; - /* Copy the video data to the buffer. */ maxlen = buf->length - buf->bytesused; - mem = buf->mem + buf->bytesused; - nbytes = min((unsigned int)len, maxlen); - memcpy(mem, data, nbytes); - buf->bytesused += nbytes; + + /* Take a buffer reference for async work */ + kref_get(&buf->ref); + + decode->buf = buf; + decode->src = data; + decode->dst = buf->mem + buf->bytesused; + decode->len = min_t(unsigned int, len, maxlen); + + buf->bytesused += decode->len; /* Complete the current frame if the buffer size was exceeded. */ if (len > maxlen) { @@ -1064,6 +1097,8 @@ static void uvc_video_decode_data(struct uvc_streaming *stream, buf->error = 1; buf->state = UVC_BUF_STATE_READY; } + + uvc_urb->async_operations++; } static void uvc_video_decode_end(struct uvc_streaming *stream, @@ -1272,7 +1307,7 @@ static void uvc_video_decode_isoc(struct uvc_urb *uvc_urb, uvc_video_decode_meta(stream, meta_buf, mem, ret); /* Decode the payload data. */ - uvc_video_decode_data(stream, buf, mem + ret, + uvc_video_decode_data(uvc_urb, buf, mem + ret, urb->iso_frame_desc[i].actual_length - ret); /* Process the header again. */ @@ -1334,9 +1369,9 @@ static void uvc_video_decode_bulk(struct uvc_urb *uvc_urb, * sure buf is never dereferenced if NULL. */ - /* Process video data. */ + /* Prepare video data for processing. */ if (!stream->bulk.skip_payload && buf != NULL) - uvc_video_decode_data(stream, buf, mem, len); + uvc_video_decode_data(uvc_urb, buf, mem, len); /* Detect the payload end by a URB smaller than the maximum size (or * a payload size equal to the maximum) and process the header again. @@ -1422,7 +1457,7 @@ static void uvc_video_complete(struct urb *urb) uvc_printk(KERN_WARNING, "Non-zero status (%d) in video " "completion handler.\n", urb->status); /* fall through */ - case -ENOENT: /* usb_kill_urb() called. */ + case -ENOENT: /* usb_poison_urb() called. */ if (stream->frozen) return; /* fall through */ @@ -1436,6 +1471,9 @@ static void uvc_video_complete(struct urb *urb) buf = uvc_queue_get_current_buffer(queue); + /* Re-initialise the URB async work. */ + uvc_urb->async_operations = 0; + if (vb2_qmeta) { spin_lock_irqsave(&qmeta->irqlock, flags); if (!list_empty(&qmeta->irqqueue)) @@ -1444,12 +1482,24 @@ static void uvc_video_complete(struct urb *urb) spin_unlock_irqrestore(&qmeta->irqlock, flags); } + /* + * Process the URB headers, and optionally queue expensive memcpy tasks + * to be deferred to a work queue. + */ stream->decode(uvc_urb, buf, buf_meta); - if ((ret = usb_submit_urb(urb, GFP_ATOMIC)) < 0) { - uvc_printk(KERN_ERR, "Failed to resubmit video URB (%d).\n", - ret); + /* If no async work is needed, resubmit the URB immediately. */ + if (!uvc_urb->async_operations) { + ret = usb_submit_urb(uvc_urb->urb, GFP_ATOMIC); + if (ret < 0) + uvc_printk(KERN_ERR, + "Failed to resubmit video URB (%d).\n", + ret); + return; } + + INIT_WORK(&uvc_urb->work, uvc_video_copy_data_work); + queue_work(stream->async_wq, &uvc_urb->work); } /* @@ -1544,25 +1594,29 @@ static int uvc_alloc_urb_buffers(struct uvc_streaming *stream, */ static void uvc_uninit_video(struct uvc_streaming *stream, int free_buffers) { - struct urb *urb; - unsigned int i; + struct uvc_urb *uvc_urb; uvc_video_stats_stop(stream); - for (i = 0; i < UVC_URBS; ++i) { - struct uvc_urb *uvc_urb = &stream->uvc_urb[i]; + /* + * We must poison the URBs rather than kill them to ensure that even + * after the completion handler returns, any asynchronous workqueues + * will be prevented from resubmitting the URBs + */ + for_each_uvc_urb(uvc_urb, stream) + usb_poison_urb(uvc_urb->urb); - urb = uvc_urb->urb; - if (urb == NULL) - continue; + flush_workqueue(stream->async_wq); - usb_kill_urb(urb); - usb_free_urb(urb); + for_each_uvc_urb(uvc_urb, stream) { + usb_free_urb(uvc_urb->urb); uvc_urb->urb = NULL; } if (free_buffers) uvc_free_urb_buffers(stream); + + destroy_workqueue(stream->async_wq); } /* @@ -1720,6 +1774,11 @@ static int uvc_init_video(struct uvc_streaming *stream, gfp_t gfp_flags) uvc_video_stats_start(stream); + stream->async_wq = alloc_workqueue("uvcvideo", WQ_UNBOUND | WQ_HIGHPRI, + 0); + if (!stream->async_wq) + return -ENOMEM; + if (intf->num_altsetting > 1) { struct usb_host_endpoint *best_ep = NULL; unsigned int best_psize = UINT_MAX; diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h index 112eed49bf50..27c230430eda 100644 --- a/drivers/media/usb/uvc/uvcvideo.h +++ b/drivers/media/usb/uvc/uvcvideo.h @@ -485,12 +485,30 @@ struct uvc_stats_stream { #define UVC_METATADA_BUF_SIZE 1024 /** + * struct uvc_copy_op: Context structure to schedule asynchronous memcpy + * + * @buf: active buf object for this operation + * @dst: copy destination address + * @src: copy source address + * @len: copy length + */ +struct uvc_copy_op { + struct uvc_buffer *buf; + void *dst; + const __u8 *src; + size_t len; +}; + +/** * struct uvc_urb - URB context management structure * * @urb: the URB described by this context structure * @stream: UVC streaming context * @buffer: memory storage for the URB * @dma: DMA coherent addressing for the urb_buffer + * @async_operations: counter to indicate the number of copy operations + * @copy_operations: work descriptors for asynchronous copy operations + * @work: work queue entry for asynchronous decode */ struct uvc_urb { struct urb *urb; @@ -498,6 +516,10 @@ struct uvc_urb { char *buffer; dma_addr_t dma; + + unsigned int async_operations; + struct uvc_copy_op copy_operations[UVC_MAX_PACKETS]; + struct work_struct work; }; struct uvc_streaming { @@ -530,6 +552,7 @@ struct uvc_streaming { /* Buffers queue. */ unsigned int frozen : 1; struct uvc_video_queue queue; + struct workqueue_struct *async_wq; void (*decode)(struct uvc_urb *uvc_urb, struct uvc_buffer *buf, struct uvc_buffer *meta_buf); @@ -583,6 +606,11 @@ struct uvc_streaming { } clock; }; +#define for_each_uvc_urb(uvc_urb, uvc_streaming) \ + for (uvc_urb = &uvc_streaming->uvc_urb[0]; \ + uvc_urb < &uvc_streaming->uvc_urb[UVC_URBS]; \ + ++uvc_urb) + struct uvc_device { struct usb_device *udev; struct usb_interface *intf;
Newer high definition cameras, and cameras with multiple lenses such as the range of stereo-vision cameras now available have ever increasing data rates. The inclusion of a variable length packet header in URB packets mean that we must memcpy the frame data out to our destination 'manually'. This can result in data rates of up to 2 gigabits per second being processed. To improve efficiency, and maximise throughput, handle the URB decode processing through a work queue to move it from interrupt context, and allow multiple processors to work on URBs in parallel. Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com> --- v2: - Lock full critical section of usb_submit_urb() v3: - Fix race on submitting uvc_video_decode_data_work() to work queue. - Rename uvc_decode_op -> uvc_copy_op (Generic to encode/decode) - Rename decodes -> copy_operations - Don't queue work if there is no async task - obtain copy op structure directly in uvc_video_decode_data() - uvc_video_decode_data_work() -> uvc_video_copy_data_work() v4: - Provide for_each_uvc_urb() - Simplify fix for shutdown race to flush queue before freeing URBs - Rebase to v4.16-rc4 (linux-media/master) adjusting for metadata conflicts. drivers/media/usb/uvc/uvc_video.c | 107 ++++++++++++++++++++++++------- drivers/media/usb/uvc/uvcvideo.h | 28 ++++++++- 2 files changed, 111 insertions(+), 24 deletions(-)