diff mbox series

[v2,2/4] media: uvcvideo: Do not set an async control owned by other fh

Message ID 20241127-uvc-fix-async-v2-2-510aab9570dd@chromium.org (mailing list archive)
State New
Headers show
Series media: uvcvideo: Two fixes for async controls | expand

Commit Message

Ricardo Ribalda Nov. 27, 2024, 12:14 p.m. UTC
If a file handle is waiting for a response from an async control, avoid
that other file handle operate with it.

Without this patch, the first file handle will never get the event
associated with that operation, which can lead to endless loops in
applications. Eg:
If an application A wants to change the zoom and to know when the
operation has completed:
it will open the video node, subscribe to the zoom event, change the
control and wait for zoom to finish.
If before the zoom operation finishes, another application B changes
the zoom, the first app A will loop forever.

Cc: stable@vger.kernel.org
Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
---
 drivers/media/usb/uvc/uvc_ctrl.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Laurent Pinchart Nov. 28, 2024, 10:22 p.m. UTC | #1
Hi Ricardo,

(CC'ing Hans Verkuil)

Thank you for the patch.

On Wed, Nov 27, 2024 at 12:14:50PM +0000, Ricardo Ribalda wrote:
> If a file handle is waiting for a response from an async control, avoid
> that other file handle operate with it.
> 
> Without this patch, the first file handle will never get the event
> associated with that operation, which can lead to endless loops in
> applications. Eg:
> If an application A wants to change the zoom and to know when the
> operation has completed:
> it will open the video node, subscribe to the zoom event, change the
> control and wait for zoom to finish.
> If before the zoom operation finishes, another application B changes
> the zoom, the first app A will loop forever.

Hans, the V4L2 specification isn't very clear on this. I see pros and
cons for both behaviours, with a preference for the current behaviour,
as with this patch the control will remain busy until the file handle is
closed if the device doesn't send the control event for any reason. What
do you think ?

> Cc: stable@vger.kernel.org
> Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> ---
>  drivers/media/usb/uvc/uvc_ctrl.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> index b6af4ff92cbd..3f8ae35cb3bc 100644
> --- a/drivers/media/usb/uvc/uvc_ctrl.c
> +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> @@ -1955,6 +1955,10 @@ int uvc_ctrl_set(struct uvc_fh *handle,
>  	if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
>  		return -EACCES;
>  
> +	/* Other file handle is waiting a response from this async control. */
> +	if (ctrl->handle && ctrl->handle != handle)
> +		return -EBUSY;
> +
>  	/* Clamp out of range values. */
>  	switch (mapping->v4l2_type) {
>  	case V4L2_CTRL_TYPE_INTEGER:
Ricardo Ribalda Nov. 28, 2024, 10:28 p.m. UTC | #2
On Thu, 28 Nov 2024 at 23:22, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Ricardo,
>
> (CC'ing Hans Verkuil)
>
> Thank you for the patch.
>
> On Wed, Nov 27, 2024 at 12:14:50PM +0000, Ricardo Ribalda wrote:
> > If a file handle is waiting for a response from an async control, avoid
> > that other file handle operate with it.
> >
> > Without this patch, the first file handle will never get the event
> > associated with that operation, which can lead to endless loops in
> > applications. Eg:
> > If an application A wants to change the zoom and to know when the
> > operation has completed:
> > it will open the video node, subscribe to the zoom event, change the
> > control and wait for zoom to finish.
> > If before the zoom operation finishes, another application B changes
> > the zoom, the first app A will loop forever.
>
> Hans, the V4L2 specification isn't very clear on this. I see pros and
> cons for both behaviours, with a preference for the current behaviour,
> as with this patch the control will remain busy until the file handle is
> closed if the device doesn't send the control event for any reason. What
> do you think ?

Just one small clarification. The same file handler can change the
value of the async control as many times as it wants, even if the
operation has not finished.

It will be other file handles that will get -EBUSY if they try to use
an async control with an unfinished operation started by another fh.

>
> > Cc: stable@vger.kernel.org
> > Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
> > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> > ---
> >  drivers/media/usb/uvc/uvc_ctrl.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> > index b6af4ff92cbd..3f8ae35cb3bc 100644
> > --- a/drivers/media/usb/uvc/uvc_ctrl.c
> > +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> > @@ -1955,6 +1955,10 @@ int uvc_ctrl_set(struct uvc_fh *handle,
> >       if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
> >               return -EACCES;
> >
> > +     /* Other file handle is waiting a response from this async control. */
> > +     if (ctrl->handle && ctrl->handle != handle)
> > +             return -EBUSY;
> > +
> >       /* Clamp out of range values. */
> >       switch (mapping->v4l2_type) {
> >       case V4L2_CTRL_TYPE_INTEGER:
>
> --
> Regards,
>
> Laurent Pinchart
Laurent Pinchart Nov. 28, 2024, 10:33 p.m. UTC | #3
On Thu, Nov 28, 2024 at 11:28:29PM +0100, Ricardo Ribalda wrote:
> On Thu, 28 Nov 2024 at 23:22, Laurent Pinchart wrote:
> >
> > Hi Ricardo,
> >
> > (CC'ing Hans Verkuil)
> >
> > Thank you for the patch.
> >
> > On Wed, Nov 27, 2024 at 12:14:50PM +0000, Ricardo Ribalda wrote:
> > > If a file handle is waiting for a response from an async control, avoid
> > > that other file handle operate with it.
> > >
> > > Without this patch, the first file handle will never get the event
> > > associated with that operation, which can lead to endless loops in
> > > applications. Eg:
> > > If an application A wants to change the zoom and to know when the
> > > operation has completed:
> > > it will open the video node, subscribe to the zoom event, change the
> > > control and wait for zoom to finish.
> > > If before the zoom operation finishes, another application B changes
> > > the zoom, the first app A will loop forever.
> >
> > Hans, the V4L2 specification isn't very clear on this. I see pros and
> > cons for both behaviours, with a preference for the current behaviour,
> > as with this patch the control will remain busy until the file handle is
> > closed if the device doesn't send the control event for any reason. What
> > do you think ?
> 
> Just one small clarification. The same file handler can change the
> value of the async control as many times as it wants, even if the
> operation has not finished.
> 
> It will be other file handles that will get -EBUSY if they try to use
> an async control with an unfinished operation started by another fh.

Yes, I should have been more precised. If the device doesn't send the
control event, then all other file handles will be prevented from
setting the control until the file handle that set it first gets closed.

> > > Cc: stable@vger.kernel.org
> > > Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
> > > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> > > ---
> > >  drivers/media/usb/uvc/uvc_ctrl.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> > > index b6af4ff92cbd..3f8ae35cb3bc 100644
> > > --- a/drivers/media/usb/uvc/uvc_ctrl.c
> > > +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> > > @@ -1955,6 +1955,10 @@ int uvc_ctrl_set(struct uvc_fh *handle,
> > >       if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
> > >               return -EACCES;
> > >
> > > +     /* Other file handle is waiting a response from this async control. */
> > > +     if (ctrl->handle && ctrl->handle != handle)
> > > +             return -EBUSY;
> > > +
> > >       /* Clamp out of range values. */
> > >       switch (mapping->v4l2_type) {
> > >       case V4L2_CTRL_TYPE_INTEGER:
Hans Verkuil Nov. 29, 2024, 10:36 a.m. UTC | #4
Hi Laurent, Ricardo,

On 28/11/2024 23:33, Laurent Pinchart wrote:
> On Thu, Nov 28, 2024 at 11:28:29PM +0100, Ricardo Ribalda wrote:
>> On Thu, 28 Nov 2024 at 23:22, Laurent Pinchart wrote:
>>>
>>> Hi Ricardo,
>>>
>>> (CC'ing Hans Verkuil)
>>>
>>> Thank you for the patch.
>>>
>>> On Wed, Nov 27, 2024 at 12:14:50PM +0000, Ricardo Ribalda wrote:
>>>> If a file handle is waiting for a response from an async control, avoid
>>>> that other file handle operate with it.
>>>>
>>>> Without this patch, the first file handle will never get the event
>>>> associated with that operation, which can lead to endless loops in
>>>> applications. Eg:
>>>> If an application A wants to change the zoom and to know when the
>>>> operation has completed:
>>>> it will open the video node, subscribe to the zoom event, change the
>>>> control and wait for zoom to finish.
>>>> If before the zoom operation finishes, another application B changes
>>>> the zoom, the first app A will loop forever.
>>>
>>> Hans, the V4L2 specification isn't very clear on this. I see pros and
>>> cons for both behaviours, with a preference for the current behaviour,
>>> as with this patch the control will remain busy until the file handle is
>>> closed if the device doesn't send the control event for any reason. What
>>> do you think ?
>>
>> Just one small clarification. The same file handler can change the
>> value of the async control as many times as it wants, even if the
>> operation has not finished.
>>
>> It will be other file handles that will get -EBUSY if they try to use
>> an async control with an unfinished operation started by another fh.
> 
> Yes, I should have been more precised. If the device doesn't send the
> control event, then all other file handles will be prevented from
> setting the control until the file handle that set it first gets closed.

I think I need a bit more background here:

First of all, what is an asynchronous control in UVC? I think that means
you can set it, but it takes time for that operation to finish, so you
get an event later when the operation is done. So zoom and similar operations
are examples of that.

And only when the operation finishes will the control event be sent, correct?

While the operation is ongoing, if you query the control value, is that reporting
the current position or the final position?

E.g.: the zoom control is at value 0 and I set it to 10, then I poll the zoom control
value: will that report the intermediate values until it reaches 10? And when it is
at 10, the control event is sent?

Regards,

	Hans

> 
>>>> Cc: stable@vger.kernel.org
>>>> Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
>>>> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
>>>> ---
>>>>  drivers/media/usb/uvc/uvc_ctrl.c | 4 ++++
>>>>  1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
>>>> index b6af4ff92cbd..3f8ae35cb3bc 100644
>>>> --- a/drivers/media/usb/uvc/uvc_ctrl.c
>>>> +++ b/drivers/media/usb/uvc/uvc_ctrl.c
>>>> @@ -1955,6 +1955,10 @@ int uvc_ctrl_set(struct uvc_fh *handle,
>>>>       if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
>>>>               return -EACCES;
>>>>
>>>> +     /* Other file handle is waiting a response from this async control. */
>>>> +     if (ctrl->handle && ctrl->handle != handle)
>>>> +             return -EBUSY;
>>>> +
>>>>       /* Clamp out of range values. */
>>>>       switch (mapping->v4l2_type) {
>>>>       case V4L2_CTRL_TYPE_INTEGER:
>
Ricardo Ribalda Nov. 29, 2024, 10:59 a.m. UTC | #5
On Fri, 29 Nov 2024 at 11:36, Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>
> Hi Laurent, Ricardo,
>
> On 28/11/2024 23:33, Laurent Pinchart wrote:
> > On Thu, Nov 28, 2024 at 11:28:29PM +0100, Ricardo Ribalda wrote:
> >> On Thu, 28 Nov 2024 at 23:22, Laurent Pinchart wrote:
> >>>
> >>> Hi Ricardo,
> >>>
> >>> (CC'ing Hans Verkuil)
> >>>
> >>> Thank you for the patch.
> >>>
> >>> On Wed, Nov 27, 2024 at 12:14:50PM +0000, Ricardo Ribalda wrote:
> >>>> If a file handle is waiting for a response from an async control, avoid
> >>>> that other file handle operate with it.
> >>>>
> >>>> Without this patch, the first file handle will never get the event
> >>>> associated with that operation, which can lead to endless loops in
> >>>> applications. Eg:
> >>>> If an application A wants to change the zoom and to know when the
> >>>> operation has completed:
> >>>> it will open the video node, subscribe to the zoom event, change the
> >>>> control and wait for zoom to finish.
> >>>> If before the zoom operation finishes, another application B changes
> >>>> the zoom, the first app A will loop forever.
> >>>
> >>> Hans, the V4L2 specification isn't very clear on this. I see pros and
> >>> cons for both behaviours, with a preference for the current behaviour,
> >>> as with this patch the control will remain busy until the file handle is
> >>> closed if the device doesn't send the control event for any reason. What
> >>> do you think ?
> >>
> >> Just one small clarification. The same file handler can change the
> >> value of the async control as many times as it wants, even if the
> >> operation has not finished.
> >>
> >> It will be other file handles that will get -EBUSY if they try to use
> >> an async control with an unfinished operation started by another fh.
> >
> > Yes, I should have been more precised. If the device doesn't send the
> > control event, then all other file handles will be prevented from
> > setting the control until the file handle that set it first gets closed.
>
> I think I need a bit more background here:
>
> First of all, what is an asynchronous control in UVC? I think that means
> you can set it, but it takes time for that operation to finish, so you
> get an event later when the operation is done. So zoom and similar operations
> are examples of that.
>
> And only when the operation finishes will the control event be sent, correct?

You are correct.  This diagrams from the spec is more or less clear:
https://ibb.co/MDGn7F3

>
> While the operation is ongoing, if you query the control value, is that reporting
> the current position or the final position?

I'd expect hardware will return either the current position, the start
position or the final position. I could not find anything in the spec
that points in one direction or the others.

And in the driver I believe that we might have a bug handling this
case (will send a patch if I can confirm it)
the zoom is at 0 and you set it 10
if you read the value 2 times before the camera reaches value 10:
- First value will come from the hardware and the response will be cached
- Second value will be the cached one

now the camera  is at zoom 10
If you read the value, you will read the cached value


>
> E.g.: the zoom control is at value 0 and I set it to 10, then I poll the zoom control
> value: will that report the intermediate values until it reaches 10? And when it is
> at 10, the control event is sent?
>
> Regards,
>
>         Hans
>
> >
> >>>> Cc: stable@vger.kernel.org
> >>>> Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
> >>>> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> >>>> ---
> >>>>  drivers/media/usb/uvc/uvc_ctrl.c | 4 ++++
> >>>>  1 file changed, 4 insertions(+)
> >>>>
> >>>> diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> >>>> index b6af4ff92cbd..3f8ae35cb3bc 100644
> >>>> --- a/drivers/media/usb/uvc/uvc_ctrl.c
> >>>> +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> >>>> @@ -1955,6 +1955,10 @@ int uvc_ctrl_set(struct uvc_fh *handle,
> >>>>       if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
> >>>>               return -EACCES;
> >>>>
> >>>> +     /* Other file handle is waiting a response from this async control. */
> >>>> +     if (ctrl->handle && ctrl->handle != handle)
> >>>> +             return -EBUSY;
> >>>> +
> >>>>       /* Clamp out of range values. */
> >>>>       switch (mapping->v4l2_type) {
> >>>>       case V4L2_CTRL_TYPE_INTEGER:
> >
>
Laurent Pinchart Nov. 29, 2024, 11:01 a.m. UTC | #6
On Fri, Nov 29, 2024 at 11:36:36AM +0100, Hans Verkuil wrote:
> Hi Laurent, Ricardo,
> 
> On 28/11/2024 23:33, Laurent Pinchart wrote:
> > On Thu, Nov 28, 2024 at 11:28:29PM +0100, Ricardo Ribalda wrote:
> >> On Thu, 28 Nov 2024 at 23:22, Laurent Pinchart wrote:
> >>>
> >>> Hi Ricardo,
> >>>
> >>> (CC'ing Hans Verkuil)
> >>>
> >>> Thank you for the patch.
> >>>
> >>> On Wed, Nov 27, 2024 at 12:14:50PM +0000, Ricardo Ribalda wrote:
> >>>> If a file handle is waiting for a response from an async control, avoid
> >>>> that other file handle operate with it.
> >>>>
> >>>> Without this patch, the first file handle will never get the event
> >>>> associated with that operation, which can lead to endless loops in
> >>>> applications. Eg:
> >>>> If an application A wants to change the zoom and to know when the
> >>>> operation has completed:
> >>>> it will open the video node, subscribe to the zoom event, change the
> >>>> control and wait for zoom to finish.
> >>>> If before the zoom operation finishes, another application B changes
> >>>> the zoom, the first app A will loop forever.
> >>>
> >>> Hans, the V4L2 specification isn't very clear on this. I see pros and
> >>> cons for both behaviours, with a preference for the current behaviour,
> >>> as with this patch the control will remain busy until the file handle is
> >>> closed if the device doesn't send the control event for any reason. What
> >>> do you think ?
> >>
> >> Just one small clarification. The same file handler can change the
> >> value of the async control as many times as it wants, even if the
> >> operation has not finished.
> >>
> >> It will be other file handles that will get -EBUSY if they try to use
> >> an async control with an unfinished operation started by another fh.
> > 
> > Yes, I should have been more precised. If the device doesn't send the
> > control event, then all other file handles will be prevented from
> > setting the control until the file handle that set it first gets closed.
> 
> I think I need a bit more background here:
> 
> First of all, what is an asynchronous control in UVC? I think that means
> you can set it, but it takes time for that operation to finish, so you
> get an event later when the operation is done. So zoom and similar operations
> are examples of that.

Correct. Physical PTZ (pan/tilt/zoom) is the prime use case. The UVC
specification states that any control that requires more than 10ms to
respond to a SET_CUR request must be reported by the device as an
asynchronous control, but does not tell which control must or must not
be asynchronous.

> And only when the operation finishes will the control event be sent, correct?

Correct.

> While the operation is ongoing, if you query the control value, is that reporting
> the current position or the final position?

The UVC specification indicates that a GET_CUR or SET_CUR request on an
asynchronous control will stall until the control value has been
applied. This means you won't be able to poll the control value.

In practice, devices often take liberties with the implementation of the
specification, so some may let you get or set the value of an
asynchronous control while a SET_CUR request is in progress.

> E.g.: the zoom control is at value 0 and I set it to 10, then I poll the zoom control
> value: will that report the intermediate values until it reaches 10? And when it is
> at 10, the control event is sent?
>
> >>>> Cc: stable@vger.kernel.org
> >>>> Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
> >>>> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> >>>> ---
> >>>>  drivers/media/usb/uvc/uvc_ctrl.c | 4 ++++
> >>>>  1 file changed, 4 insertions(+)
> >>>>
> >>>> diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> >>>> index b6af4ff92cbd..3f8ae35cb3bc 100644
> >>>> --- a/drivers/media/usb/uvc/uvc_ctrl.c
> >>>> +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> >>>> @@ -1955,6 +1955,10 @@ int uvc_ctrl_set(struct uvc_fh *handle,
> >>>>       if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
> >>>>               return -EACCES;
> >>>>
> >>>> +     /* Other file handle is waiting a response from this async control. */
> >>>> +     if (ctrl->handle && ctrl->handle != handle)
> >>>> +             return -EBUSY;
> >>>> +
> >>>>       /* Clamp out of range values. */
> >>>>       switch (mapping->v4l2_type) {
> >>>>       case V4L2_CTRL_TYPE_INTEGER:
Laurent Pinchart Nov. 29, 2024, 11:06 a.m. UTC | #7
On Fri, Nov 29, 2024 at 11:59:27AM +0100, Ricardo Ribalda wrote:
> On Fri, 29 Nov 2024 at 11:36, Hans Verkuil wrote:
> > On 28/11/2024 23:33, Laurent Pinchart wrote:
> > > On Thu, Nov 28, 2024 at 11:28:29PM +0100, Ricardo Ribalda wrote:
> > >> On Thu, 28 Nov 2024 at 23:22, Laurent Pinchart wrote:
> > >>>
> > >>> Hi Ricardo,
> > >>>
> > >>> (CC'ing Hans Verkuil)
> > >>>
> > >>> Thank you for the patch.
> > >>>
> > >>> On Wed, Nov 27, 2024 at 12:14:50PM +0000, Ricardo Ribalda wrote:
> > >>>> If a file handle is waiting for a response from an async control, avoid
> > >>>> that other file handle operate with it.
> > >>>>
> > >>>> Without this patch, the first file handle will never get the event
> > >>>> associated with that operation, which can lead to endless loops in
> > >>>> applications. Eg:
> > >>>> If an application A wants to change the zoom and to know when the
> > >>>> operation has completed:
> > >>>> it will open the video node, subscribe to the zoom event, change the
> > >>>> control and wait for zoom to finish.
> > >>>> If before the zoom operation finishes, another application B changes
> > >>>> the zoom, the first app A will loop forever.
> > >>>
> > >>> Hans, the V4L2 specification isn't very clear on this. I see pros and
> > >>> cons for both behaviours, with a preference for the current behaviour,
> > >>> as with this patch the control will remain busy until the file handle is
> > >>> closed if the device doesn't send the control event for any reason. What
> > >>> do you think ?
> > >>
> > >> Just one small clarification. The same file handler can change the
> > >> value of the async control as many times as it wants, even if the
> > >> operation has not finished.
> > >>
> > >> It will be other file handles that will get -EBUSY if they try to use
> > >> an async control with an unfinished operation started by another fh.
> > >
> > > Yes, I should have been more precised. If the device doesn't send the
> > > control event, then all other file handles will be prevented from
> > > setting the control until the file handle that set it first gets closed.
> >
> > I think I need a bit more background here:
> >
> > First of all, what is an asynchronous control in UVC? I think that means
> > you can set it, but it takes time for that operation to finish, so you
> > get an event later when the operation is done. So zoom and similar operations
> > are examples of that.
> >
> > And only when the operation finishes will the control event be sent, correct?
> 
> You are correct.  This diagrams from the spec is more or less clear:
> https://ibb.co/MDGn7F3
> 
> > While the operation is ongoing, if you query the control value, is that reporting
> > the current position or the final position?
> 
> I'd expect hardware will return either the current position, the start
> position or the final position. I could not find anything in the spec
> that points in one direction or the others.

Figure 2-21 in UVC 1.5 indicates that the device should STALL the
GET_CUR and SET_CUR requests if a state change is in progress.

> And in the driver I believe that we might have a bug handling this
> case (will send a patch if I can confirm it)
> the zoom is at 0 and you set it 10
> if you read the value 2 times before the camera reaches value 10:
> - First value will come from the hardware and the response will be cached

Only if the control doesn't have the auto-update flag set, so it will be
device-dependent. As GET_CUR should stall that's not really relevant,
except for the fact that devices may not stall the request.

> - Second value will be the cached one
> 
> now the camera  is at zoom 10
> If you read the value, you will read the cached value
>
> > E.g.: the zoom control is at value 0 and I set it to 10, then I poll the zoom control
> > value: will that report the intermediate values until it reaches 10? And when it is
> > at 10, the control event is sent?
> >
> > >>>> Cc: stable@vger.kernel.org
> > >>>> Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
> > >>>> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> > >>>> ---
> > >>>>  drivers/media/usb/uvc/uvc_ctrl.c | 4 ++++
> > >>>>  1 file changed, 4 insertions(+)
> > >>>>
> > >>>> diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> > >>>> index b6af4ff92cbd..3f8ae35cb3bc 100644
> > >>>> --- a/drivers/media/usb/uvc/uvc_ctrl.c
> > >>>> +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> > >>>> @@ -1955,6 +1955,10 @@ int uvc_ctrl_set(struct uvc_fh *handle,
> > >>>>       if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
> > >>>>               return -EACCES;
> > >>>>
> > >>>> +     /* Other file handle is waiting a response from this async control. */
> > >>>> +     if (ctrl->handle && ctrl->handle != handle)
> > >>>> +             return -EBUSY;
> > >>>> +
> > >>>>       /* Clamp out of range values. */
> > >>>>       switch (mapping->v4l2_type) {
> > >>>>       case V4L2_CTRL_TYPE_INTEGER:
Ricardo Ribalda Nov. 29, 2024, 11:54 a.m. UTC | #8
On Fri, 29 Nov 2024 at 12:06, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> On Fri, Nov 29, 2024 at 11:59:27AM +0100, Ricardo Ribalda wrote:
> > On Fri, 29 Nov 2024 at 11:36, Hans Verkuil wrote:
> > > On 28/11/2024 23:33, Laurent Pinchart wrote:
> > > > On Thu, Nov 28, 2024 at 11:28:29PM +0100, Ricardo Ribalda wrote:
> > > >> On Thu, 28 Nov 2024 at 23:22, Laurent Pinchart wrote:
> > > >>>
> > > >>> Hi Ricardo,
> > > >>>
> > > >>> (CC'ing Hans Verkuil)
> > > >>>
> > > >>> Thank you for the patch.
> > > >>>
> > > >>> On Wed, Nov 27, 2024 at 12:14:50PM +0000, Ricardo Ribalda wrote:
> > > >>>> If a file handle is waiting for a response from an async control, avoid
> > > >>>> that other file handle operate with it.
> > > >>>>
> > > >>>> Without this patch, the first file handle will never get the event
> > > >>>> associated with that operation, which can lead to endless loops in
> > > >>>> applications. Eg:
> > > >>>> If an application A wants to change the zoom and to know when the
> > > >>>> operation has completed:
> > > >>>> it will open the video node, subscribe to the zoom event, change the
> > > >>>> control and wait for zoom to finish.
> > > >>>> If before the zoom operation finishes, another application B changes
> > > >>>> the zoom, the first app A will loop forever.
> > > >>>
> > > >>> Hans, the V4L2 specification isn't very clear on this. I see pros and
> > > >>> cons for both behaviours, with a preference for the current behaviour,
> > > >>> as with this patch the control will remain busy until the file handle is
> > > >>> closed if the device doesn't send the control event for any reason. What
> > > >>> do you think ?
> > > >>
> > > >> Just one small clarification. The same file handler can change the
> > > >> value of the async control as many times as it wants, even if the
> > > >> operation has not finished.
> > > >>
> > > >> It will be other file handles that will get -EBUSY if they try to use
> > > >> an async control with an unfinished operation started by another fh.
> > > >
> > > > Yes, I should have been more precised. If the device doesn't send the
> > > > control event, then all other file handles will be prevented from
> > > > setting the control until the file handle that set it first gets closed.
> > >
> > > I think I need a bit more background here:
> > >
> > > First of all, what is an asynchronous control in UVC? I think that means
> > > you can set it, but it takes time for that operation to finish, so you
> > > get an event later when the operation is done. So zoom and similar operations
> > > are examples of that.
> > >
> > > And only when the operation finishes will the control event be sent, correct?
> >
> > You are correct.  This diagrams from the spec is more or less clear:
> > https://ibb.co/MDGn7F3
> >
> > > While the operation is ongoing, if you query the control value, is that reporting
> > > the current position or the final position?
> >
> > I'd expect hardware will return either the current position, the start
> > position or the final position. I could not find anything in the spec
> > that points in one direction or the others.
>
> Figure 2-21 in UVC 1.5 indicates that the device should STALL the
> GET_CUR and SET_CUR requests if a state change is in progress.
>
> > And in the driver I believe that we might have a bug handling this
> > case (will send a patch if I can confirm it)
> > the zoom is at 0 and you set it 10
> > if you read the value 2 times before the camera reaches value 10:
> > - First value will come from the hardware and the response will be cached
>
> Only if the control doesn't have the auto-update flag set, so it will be
> device-dependent. As GET_CUR should stall that's not really relevant,
> except for the fact that devices may not stall the request.

I missed that the device will likely stall during async operations.

What do you think of something like this? I believe it can work with
compliant and non compliant devices.
Note that the event will be received by the device that originated the
operation, not to the second one that might receive an error during
write/read.



diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
index 4fe26e82e3d1..9a86c912e7a2 100644
--- a/drivers/media/usb/uvc/uvc_ctrl.c
+++ b/drivers/media/usb/uvc/uvc_ctrl.c
@@ -1826,14 +1826,15 @@ static int uvc_ctrl_commit_entity(struct
uvc_device *dev,
                        continue;

                /*
-                * Reset the loaded flag for auto-update controls that were
+                * Reset the loaded flag for auto-update controls and for
+                * asynchronous controls with pending operations, that were
                 * marked as loaded in uvc_ctrl_get/uvc_ctrl_set to prevent
                 * uvc_ctrl_get from using the cached value, and for write-only
                 * controls to prevent uvc_ctrl_set from setting bits not
                 * explicitly set by the user.
                 */
                if (ctrl->info.flags & UVC_CTRL_FLAG_AUTO_UPDATE ||
-                   !(ctrl->info.flags & UVC_CTRL_FLAG_GET_CUR))
+                   !(ctrl->info.flags & UVC_CTRL_FLAG_GET_CUR) || ctrl->handle)
                        ctrl->loaded = 0;

                if (!ctrl->dirty)
@@ -2046,8 +2047,18 @@ int uvc_ctrl_set(struct uvc_fh *handle,
        mapping->set(mapping, value,
                uvc_ctrl_data(ctrl, UVC_CTRL_DATA_CURRENT));

-       if (ctrl->info.flags & UVC_CTRL_FLAG_ASYNCHRONOUS)
-               ctrl->handle = handle;
+       if (ctrl->info.flags & UVC_CTRL_FLAG_ASYNCHRONOUS) {
+               /*
+                * Other file handle is waiting for an operation on
+                * this asynchronous control. If the device is compliant
+                * this operation will fail.
+                *
+                * Do not replace the handle pointer, so the original file
+                * descriptor will get the completion event.
+                */
+               if (!ctrl->handle)
+                       ctrl->handle = handle;
+       }

        ctrl->dirty = 1;
        ctrl->modified = 1;

>
> > - Second value will be the cached one
> >
> > now the camera  is at zoom 10
> > If you read the value, you will read the cached value
> >
> > > E.g.: the zoom control is at value 0 and I set it to 10, then I poll the zoom control
> > > value: will that report the intermediate values until it reaches 10? And when it is
> > > at 10, the control event is sent?
> > >
> > > >>>> Cc: stable@vger.kernel.org
> > > >>>> Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
> > > >>>> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> > > >>>> ---
> > > >>>>  drivers/media/usb/uvc/uvc_ctrl.c | 4 ++++
> > > >>>>  1 file changed, 4 insertions(+)
> > > >>>>
> > > >>>> diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> > > >>>> index b6af4ff92cbd..3f8ae35cb3bc 100644
> > > >>>> --- a/drivers/media/usb/uvc/uvc_ctrl.c
> > > >>>> +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> > > >>>> @@ -1955,6 +1955,10 @@ int uvc_ctrl_set(struct uvc_fh *handle,
> > > >>>>       if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
> > > >>>>               return -EACCES;
> > > >>>>
> > > >>>> +     /* Other file handle is waiting a response from this async control. */
> > > >>>> +     if (ctrl->handle && ctrl->handle != handle)
> > > >>>> +             return -EBUSY;
> > > >>>> +
> > > >>>>       /* Clamp out of range values. */
> > > >>>>       switch (mapping->v4l2_type) {
> > > >>>>       case V4L2_CTRL_TYPE_INTEGER:
>
> --
> Regards,
>
> Laurent Pinchart
Hans Verkuil Nov. 29, 2024, 1:10 p.m. UTC | #9
On 29/11/2024 12:06, Laurent Pinchart wrote:
> On Fri, Nov 29, 2024 at 11:59:27AM +0100, Ricardo Ribalda wrote:
>> On Fri, 29 Nov 2024 at 11:36, Hans Verkuil wrote:
>>> On 28/11/2024 23:33, Laurent Pinchart wrote:
>>>> On Thu, Nov 28, 2024 at 11:28:29PM +0100, Ricardo Ribalda wrote:
>>>>> On Thu, 28 Nov 2024 at 23:22, Laurent Pinchart wrote:
>>>>>>
>>>>>> Hi Ricardo,
>>>>>>
>>>>>> (CC'ing Hans Verkuil)
>>>>>>
>>>>>> Thank you for the patch.
>>>>>>
>>>>>> On Wed, Nov 27, 2024 at 12:14:50PM +0000, Ricardo Ribalda wrote:
>>>>>>> If a file handle is waiting for a response from an async control, avoid
>>>>>>> that other file handle operate with it.
>>>>>>>
>>>>>>> Without this patch, the first file handle will never get the event
>>>>>>> associated with that operation, which can lead to endless loops in
>>>>>>> applications. Eg:
>>>>>>> If an application A wants to change the zoom and to know when the
>>>>>>> operation has completed:
>>>>>>> it will open the video node, subscribe to the zoom event, change the
>>>>>>> control and wait for zoom to finish.
>>>>>>> If before the zoom operation finishes, another application B changes
>>>>>>> the zoom, the first app A will loop forever.
>>>>>>
>>>>>> Hans, the V4L2 specification isn't very clear on this. I see pros and
>>>>>> cons for both behaviours, with a preference for the current behaviour,
>>>>>> as with this patch the control will remain busy until the file handle is
>>>>>> closed if the device doesn't send the control event for any reason. What
>>>>>> do you think ?
>>>>>
>>>>> Just one small clarification. The same file handler can change the
>>>>> value of the async control as many times as it wants, even if the
>>>>> operation has not finished.
>>>>>
>>>>> It will be other file handles that will get -EBUSY if they try to use
>>>>> an async control with an unfinished operation started by another fh.
>>>>
>>>> Yes, I should have been more precised. If the device doesn't send the
>>>> control event, then all other file handles will be prevented from
>>>> setting the control until the file handle that set it first gets closed.
>>>
>>> I think I need a bit more background here:
>>>
>>> First of all, what is an asynchronous control in UVC? I think that means
>>> you can set it, but it takes time for that operation to finish, so you
>>> get an event later when the operation is done. So zoom and similar operations
>>> are examples of that.
>>>
>>> And only when the operation finishes will the control event be sent, correct?
>>
>> You are correct.  This diagrams from the spec is more or less clear:
>> https://ibb.co/MDGn7F3
>>
>>> While the operation is ongoing, if you query the control value, is that reporting
>>> the current position or the final position?
>>
>> I'd expect hardware will return either the current position, the start
>> position or the final position. I could not find anything in the spec
>> that points in one direction or the others.
> 
> Figure 2-21 in UVC 1.5 indicates that the device should STALL the
> GET_CUR and SET_CUR requests if a state change is in progress.
> 
>> And in the driver I believe that we might have a bug handling this
>> case (will send a patch if I can confirm it)
>> the zoom is at 0 and you set it 10
>> if you read the value 2 times before the camera reaches value 10:
>> - First value will come from the hardware and the response will be cached
> 
> Only if the control doesn't have the auto-update flag set, so it will be
> device-dependent. As GET_CUR should stall that's not really relevant,
> except for the fact that devices may not stall the request.

OK, that helps a lot.

If an operation is in progress, then setting a new control value should
result in -EBUSY. Based on the description above, I gather that even the
same fh that made the request cannot update it while the operation is
ongoing?

Getting the control should just return the value that was set. I assume
that is cached in uvc?

Regards,

	Hans

> 
>> - Second value will be the cached one
>>
>> now the camera  is at zoom 10
>> If you read the value, you will read the cached value
>>
>>> E.g.: the zoom control is at value 0 and I set it to 10, then I poll the zoom control
>>> value: will that report the intermediate values until it reaches 10? And when it is
>>> at 10, the control event is sent?
>>>
>>>>>>> Cc: stable@vger.kernel.org
>>>>>>> Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
>>>>>>> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
>>>>>>> ---
>>>>>>>  drivers/media/usb/uvc/uvc_ctrl.c | 4 ++++
>>>>>>>  1 file changed, 4 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
>>>>>>> index b6af4ff92cbd..3f8ae35cb3bc 100644
>>>>>>> --- a/drivers/media/usb/uvc/uvc_ctrl.c
>>>>>>> +++ b/drivers/media/usb/uvc/uvc_ctrl.c
>>>>>>> @@ -1955,6 +1955,10 @@ int uvc_ctrl_set(struct uvc_fh *handle,
>>>>>>>       if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
>>>>>>>               return -EACCES;
>>>>>>>
>>>>>>> +     /* Other file handle is waiting a response from this async control. */
>>>>>>> +     if (ctrl->handle && ctrl->handle != handle)
>>>>>>> +             return -EBUSY;
>>>>>>> +
>>>>>>>       /* Clamp out of range values. */
>>>>>>>       switch (mapping->v4l2_type) {
>>>>>>>       case V4L2_CTRL_TYPE_INTEGER:
>
Hans Verkuil Nov. 29, 2024, 1:13 p.m. UTC | #10
On 29/11/2024 12:54, Ricardo Ribalda wrote:
> On Fri, 29 Nov 2024 at 12:06, Laurent Pinchart
> <laurent.pinchart@ideasonboard.com> wrote:
>>
>> On Fri, Nov 29, 2024 at 11:59:27AM +0100, Ricardo Ribalda wrote:
>>> On Fri, 29 Nov 2024 at 11:36, Hans Verkuil wrote:
>>>> On 28/11/2024 23:33, Laurent Pinchart wrote:
>>>>> On Thu, Nov 28, 2024 at 11:28:29PM +0100, Ricardo Ribalda wrote:
>>>>>> On Thu, 28 Nov 2024 at 23:22, Laurent Pinchart wrote:
>>>>>>>
>>>>>>> Hi Ricardo,
>>>>>>>
>>>>>>> (CC'ing Hans Verkuil)
>>>>>>>
>>>>>>> Thank you for the patch.
>>>>>>>
>>>>>>> On Wed, Nov 27, 2024 at 12:14:50PM +0000, Ricardo Ribalda wrote:
>>>>>>>> If a file handle is waiting for a response from an async control, avoid
>>>>>>>> that other file handle operate with it.
>>>>>>>>
>>>>>>>> Without this patch, the first file handle will never get the event
>>>>>>>> associated with that operation, which can lead to endless loops in
>>>>>>>> applications. Eg:
>>>>>>>> If an application A wants to change the zoom and to know when the
>>>>>>>> operation has completed:
>>>>>>>> it will open the video node, subscribe to the zoom event, change the
>>>>>>>> control and wait for zoom to finish.
>>>>>>>> If before the zoom operation finishes, another application B changes
>>>>>>>> the zoom, the first app A will loop forever.
>>>>>>>
>>>>>>> Hans, the V4L2 specification isn't very clear on this. I see pros and
>>>>>>> cons for both behaviours, with a preference for the current behaviour,
>>>>>>> as with this patch the control will remain busy until the file handle is
>>>>>>> closed if the device doesn't send the control event for any reason. What
>>>>>>> do you think ?
>>>>>>
>>>>>> Just one small clarification. The same file handler can change the
>>>>>> value of the async control as many times as it wants, even if the
>>>>>> operation has not finished.
>>>>>>
>>>>>> It will be other file handles that will get -EBUSY if they try to use
>>>>>> an async control with an unfinished operation started by another fh.
>>>>>
>>>>> Yes, I should have been more precised. If the device doesn't send the
>>>>> control event, then all other file handles will be prevented from
>>>>> setting the control until the file handle that set it first gets closed.
>>>>
>>>> I think I need a bit more background here:
>>>>
>>>> First of all, what is an asynchronous control in UVC? I think that means
>>>> you can set it, but it takes time for that operation to finish, so you
>>>> get an event later when the operation is done. So zoom and similar operations
>>>> are examples of that.
>>>>
>>>> And only when the operation finishes will the control event be sent, correct?
>>>
>>> You are correct.  This diagrams from the spec is more or less clear:
>>> https://ibb.co/MDGn7F3
>>>
>>>> While the operation is ongoing, if you query the control value, is that reporting
>>>> the current position or the final position?
>>>
>>> I'd expect hardware will return either the current position, the start
>>> position or the final position. I could not find anything in the spec
>>> that points in one direction or the others.
>>
>> Figure 2-21 in UVC 1.5 indicates that the device should STALL the
>> GET_CUR and SET_CUR requests if a state change is in progress.
>>
>>> And in the driver I believe that we might have a bug handling this
>>> case (will send a patch if I can confirm it)
>>> the zoom is at 0 and you set it 10
>>> if you read the value 2 times before the camera reaches value 10:
>>> - First value will come from the hardware and the response will be cached
>>
>> Only if the control doesn't have the auto-update flag set, so it will be
>> device-dependent. As GET_CUR should stall that's not really relevant,
>> except for the fact that devices may not stall the request.
> 
> I missed that the device will likely stall during async operations.
> 
> What do you think of something like this? I believe it can work with
> compliant and non compliant devices.
> Note that the event will be received by the device that originated the
> operation, not to the second one that might receive an error during
> write/read.
> 
> 
> 
> diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> index 4fe26e82e3d1..9a86c912e7a2 100644
> --- a/drivers/media/usb/uvc/uvc_ctrl.c
> +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> @@ -1826,14 +1826,15 @@ static int uvc_ctrl_commit_entity(struct
> uvc_device *dev,
>                         continue;
> 
>                 /*
> -                * Reset the loaded flag for auto-update controls that were
> +                * Reset the loaded flag for auto-update controls and for
> +                * asynchronous controls with pending operations, that were
>                  * marked as loaded in uvc_ctrl_get/uvc_ctrl_set to prevent
>                  * uvc_ctrl_get from using the cached value, and for write-only
>                  * controls to prevent uvc_ctrl_set from setting bits not
>                  * explicitly set by the user.
>                  */
>                 if (ctrl->info.flags & UVC_CTRL_FLAG_AUTO_UPDATE ||
> -                   !(ctrl->info.flags & UVC_CTRL_FLAG_GET_CUR))
> +                   !(ctrl->info.flags & UVC_CTRL_FLAG_GET_CUR) || ctrl->handle)
>                         ctrl->loaded = 0;
> 
>                 if (!ctrl->dirty)
> @@ -2046,8 +2047,18 @@ int uvc_ctrl_set(struct uvc_fh *handle,
>         mapping->set(mapping, value,
>                 uvc_ctrl_data(ctrl, UVC_CTRL_DATA_CURRENT));
> 
> -       if (ctrl->info.flags & UVC_CTRL_FLAG_ASYNCHRONOUS)
> -               ctrl->handle = handle;
> +       if (ctrl->info.flags & UVC_CTRL_FLAG_ASYNCHRONOUS) {
> +               /*
> +                * Other file handle is waiting for an operation on
> +                * this asynchronous control. If the device is compliant
> +                * this operation will fail.
> +                *
> +                * Do not replace the handle pointer, so the original file
> +                * descriptor will get the completion event.
> +                */
> +               if (!ctrl->handle)
> +                       ctrl->handle = handle;

I don't think this is right: you want the completion event for async
controls to go to all filehandles that are subscribed to that control.

Which is what happens if handle == NULL (as I understand the code).

Regards,

	Hans

> +       }
> 
>         ctrl->dirty = 1;
>         ctrl->modified = 1;
> 
>>
>>> - Second value will be the cached one
>>>
>>> now the camera  is at zoom 10
>>> If you read the value, you will read the cached value
>>>
>>>> E.g.: the zoom control is at value 0 and I set it to 10, then I poll the zoom control
>>>> value: will that report the intermediate values until it reaches 10? And when it is
>>>> at 10, the control event is sent?
>>>>
>>>>>>>> Cc: stable@vger.kernel.org
>>>>>>>> Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
>>>>>>>> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
>>>>>>>> ---
>>>>>>>>  drivers/media/usb/uvc/uvc_ctrl.c | 4 ++++
>>>>>>>>  1 file changed, 4 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
>>>>>>>> index b6af4ff92cbd..3f8ae35cb3bc 100644
>>>>>>>> --- a/drivers/media/usb/uvc/uvc_ctrl.c
>>>>>>>> +++ b/drivers/media/usb/uvc/uvc_ctrl.c
>>>>>>>> @@ -1955,6 +1955,10 @@ int uvc_ctrl_set(struct uvc_fh *handle,
>>>>>>>>       if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
>>>>>>>>               return -EACCES;
>>>>>>>>
>>>>>>>> +     /* Other file handle is waiting a response from this async control. */
>>>>>>>> +     if (ctrl->handle && ctrl->handle != handle)
>>>>>>>> +             return -EBUSY;
>>>>>>>> +
>>>>>>>>       /* Clamp out of range values. */
>>>>>>>>       switch (mapping->v4l2_type) {
>>>>>>>>       case V4L2_CTRL_TYPE_INTEGER:
>>
>> --
>> Regards,
>>
>> Laurent Pinchart
> 
> 
>
Hans Verkuil Nov. 29, 2024, 1:32 p.m. UTC | #11
On 29/11/2024 14:10, Hans Verkuil wrote:
> On 29/11/2024 12:06, Laurent Pinchart wrote:
>> On Fri, Nov 29, 2024 at 11:59:27AM +0100, Ricardo Ribalda wrote:
>>> On Fri, 29 Nov 2024 at 11:36, Hans Verkuil wrote:
>>>> On 28/11/2024 23:33, Laurent Pinchart wrote:
>>>>> On Thu, Nov 28, 2024 at 11:28:29PM +0100, Ricardo Ribalda wrote:
>>>>>> On Thu, 28 Nov 2024 at 23:22, Laurent Pinchart wrote:
>>>>>>>
>>>>>>> Hi Ricardo,
>>>>>>>
>>>>>>> (CC'ing Hans Verkuil)
>>>>>>>
>>>>>>> Thank you for the patch.
>>>>>>>
>>>>>>> On Wed, Nov 27, 2024 at 12:14:50PM +0000, Ricardo Ribalda wrote:
>>>>>>>> If a file handle is waiting for a response from an async control, avoid
>>>>>>>> that other file handle operate with it.
>>>>>>>>
>>>>>>>> Without this patch, the first file handle will never get the event
>>>>>>>> associated with that operation, which can lead to endless loops in
>>>>>>>> applications. Eg:
>>>>>>>> If an application A wants to change the zoom and to know when the
>>>>>>>> operation has completed:
>>>>>>>> it will open the video node, subscribe to the zoom event, change the
>>>>>>>> control and wait for zoom to finish.
>>>>>>>> If before the zoom operation finishes, another application B changes
>>>>>>>> the zoom, the first app A will loop forever.
>>>>>>>
>>>>>>> Hans, the V4L2 specification isn't very clear on this. I see pros and
>>>>>>> cons for both behaviours, with a preference for the current behaviour,
>>>>>>> as with this patch the control will remain busy until the file handle is
>>>>>>> closed if the device doesn't send the control event for any reason. What
>>>>>>> do you think ?
>>>>>>
>>>>>> Just one small clarification. The same file handler can change the
>>>>>> value of the async control as many times as it wants, even if the
>>>>>> operation has not finished.
>>>>>>
>>>>>> It will be other file handles that will get -EBUSY if they try to use
>>>>>> an async control with an unfinished operation started by another fh.
>>>>>
>>>>> Yes, I should have been more precised. If the device doesn't send the
>>>>> control event, then all other file handles will be prevented from
>>>>> setting the control until the file handle that set it first gets closed.
>>>>
>>>> I think I need a bit more background here:
>>>>
>>>> First of all, what is an asynchronous control in UVC? I think that means
>>>> you can set it, but it takes time for that operation to finish, so you
>>>> get an event later when the operation is done. So zoom and similar operations
>>>> are examples of that.
>>>>
>>>> And only when the operation finishes will the control event be sent, correct?
>>>
>>> You are correct.  This diagrams from the spec is more or less clear:
>>> https://ibb.co/MDGn7F3
>>>
>>>> While the operation is ongoing, if you query the control value, is that reporting
>>>> the current position or the final position?
>>>
>>> I'd expect hardware will return either the current position, the start
>>> position or the final position. I could not find anything in the spec
>>> that points in one direction or the others.
>>
>> Figure 2-21 in UVC 1.5 indicates that the device should STALL the
>> GET_CUR and SET_CUR requests if a state change is in progress.
>>
>>> And in the driver I believe that we might have a bug handling this
>>> case (will send a patch if I can confirm it)
>>> the zoom is at 0 and you set it 10
>>> if you read the value 2 times before the camera reaches value 10:
>>> - First value will come from the hardware and the response will be cached
>>
>> Only if the control doesn't have the auto-update flag set, so it will be
>> device-dependent. As GET_CUR should stall that's not really relevant,
>> except for the fact that devices may not stall the request.
> 
> OK, that helps a lot.
> 
> If an operation is in progress, then setting a new control value should
> result in -EBUSY. Based on the description above, I gather that even the
> same fh that made the request cannot update it while the operation is
> ongoing?
> 
> Getting the control should just return the value that was set. I assume
> that is cached in uvc?

Actually, it would be cleaner if the old value is returned and only update
the control value when the operation finishes. That also aligns better with
the what the change control event means.

Anyway, I don't think it is big deal in practice.

Regards,

	Hans

> 
> Regards,
> 
> 	Hans
> 
>>
>>> - Second value will be the cached one
>>>
>>> now the camera  is at zoom 10
>>> If you read the value, you will read the cached value
>>>
>>>> E.g.: the zoom control is at value 0 and I set it to 10, then I poll the zoom control
>>>> value: will that report the intermediate values until it reaches 10? And when it is
>>>> at 10, the control event is sent?
>>>>
>>>>>>>> Cc: stable@vger.kernel.org
>>>>>>>> Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
>>>>>>>> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
>>>>>>>> ---
>>>>>>>>  drivers/media/usb/uvc/uvc_ctrl.c | 4 ++++
>>>>>>>>  1 file changed, 4 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
>>>>>>>> index b6af4ff92cbd..3f8ae35cb3bc 100644
>>>>>>>> --- a/drivers/media/usb/uvc/uvc_ctrl.c
>>>>>>>> +++ b/drivers/media/usb/uvc/uvc_ctrl.c
>>>>>>>> @@ -1955,6 +1955,10 @@ int uvc_ctrl_set(struct uvc_fh *handle,
>>>>>>>>       if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
>>>>>>>>               return -EACCES;
>>>>>>>>
>>>>>>>> +     /* Other file handle is waiting a response from this async control. */
>>>>>>>> +     if (ctrl->handle && ctrl->handle != handle)
>>>>>>>> +             return -EBUSY;
>>>>>>>> +
>>>>>>>>       /* Clamp out of range values. */
>>>>>>>>       switch (mapping->v4l2_type) {
>>>>>>>>       case V4L2_CTRL_TYPE_INTEGER:
>>
> 
>
Ricardo Ribalda Nov. 29, 2024, 1:37 p.m. UTC | #12
On Fri, 29 Nov 2024 at 14:10, Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>
> On 29/11/2024 12:06, Laurent Pinchart wrote:
> > On Fri, Nov 29, 2024 at 11:59:27AM +0100, Ricardo Ribalda wrote:
> >> On Fri, 29 Nov 2024 at 11:36, Hans Verkuil wrote:
> >>> On 28/11/2024 23:33, Laurent Pinchart wrote:
> >>>> On Thu, Nov 28, 2024 at 11:28:29PM +0100, Ricardo Ribalda wrote:
> >>>>> On Thu, 28 Nov 2024 at 23:22, Laurent Pinchart wrote:
> >>>>>>
> >>>>>> Hi Ricardo,
> >>>>>>
> >>>>>> (CC'ing Hans Verkuil)
> >>>>>>
> >>>>>> Thank you for the patch.
> >>>>>>
> >>>>>> On Wed, Nov 27, 2024 at 12:14:50PM +0000, Ricardo Ribalda wrote:
> >>>>>>> If a file handle is waiting for a response from an async control, avoid
> >>>>>>> that other file handle operate with it.
> >>>>>>>
> >>>>>>> Without this patch, the first file handle will never get the event
> >>>>>>> associated with that operation, which can lead to endless loops in
> >>>>>>> applications. Eg:
> >>>>>>> If an application A wants to change the zoom and to know when the
> >>>>>>> operation has completed:
> >>>>>>> it will open the video node, subscribe to the zoom event, change the
> >>>>>>> control and wait for zoom to finish.
> >>>>>>> If before the zoom operation finishes, another application B changes
> >>>>>>> the zoom, the first app A will loop forever.
> >>>>>>
> >>>>>> Hans, the V4L2 specification isn't very clear on this. I see pros and
> >>>>>> cons for both behaviours, with a preference for the current behaviour,
> >>>>>> as with this patch the control will remain busy until the file handle is
> >>>>>> closed if the device doesn't send the control event for any reason. What
> >>>>>> do you think ?
> >>>>>
> >>>>> Just one small clarification. The same file handler can change the
> >>>>> value of the async control as many times as it wants, even if the
> >>>>> operation has not finished.
> >>>>>
> >>>>> It will be other file handles that will get -EBUSY if they try to use
> >>>>> an async control with an unfinished operation started by another fh.
> >>>>
> >>>> Yes, I should have been more precised. If the device doesn't send the
> >>>> control event, then all other file handles will be prevented from
> >>>> setting the control until the file handle that set it first gets closed.
> >>>
> >>> I think I need a bit more background here:
> >>>
> >>> First of all, what is an asynchronous control in UVC? I think that means
> >>> you can set it, but it takes time for that operation to finish, so you
> >>> get an event later when the operation is done. So zoom and similar operations
> >>> are examples of that.
> >>>
> >>> And only when the operation finishes will the control event be sent, correct?
> >>
> >> You are correct.  This diagrams from the spec is more or less clear:
> >> https://ibb.co/MDGn7F3
> >>
> >>> While the operation is ongoing, if you query the control value, is that reporting
> >>> the current position or the final position?
> >>
> >> I'd expect hardware will return either the current position, the start
> >> position or the final position. I could not find anything in the spec
> >> that points in one direction or the others.
> >
> > Figure 2-21 in UVC 1.5 indicates that the device should STALL the
> > GET_CUR and SET_CUR requests if a state change is in progress.
> >
> >> And in the driver I believe that we might have a bug handling this
> >> case (will send a patch if I can confirm it)
> >> the zoom is at 0 and you set it 10
> >> if you read the value 2 times before the camera reaches value 10:
> >> - First value will come from the hardware and the response will be cached
> >
> > Only if the control doesn't have the auto-update flag set, so it will be
> > device-dependent. As GET_CUR should stall that's not really relevant,
> > except for the fact that devices may not stall the request.
>
> OK, that helps a lot.
>
> If an operation is in progress, then setting a new control value should
> result in -EBUSY. Based on the description above, I gather that even the
> same fh that made the request cannot update it while the operation is
> ongoing?

That is correct according to the spec. But both Laurent (correct me if
I am wrong) and me suspect that there are devices that do not
implement this properly.


>
> Getting the control should just return the value that was set. I assume
> that is cached in uvc?

If I get the code right... we only return the cached value when the
field "loaded" is true. That happens when we read the device. So if
the driver is just loaded

write(full controlA)
read(full controlA)
read(full controlA)

The first read will get the value from the hardware, the second will be cached.


>
> Regards,
>
>         Hans
>
> >
> >> - Second value will be the cached one
> >>
> >> now the camera  is at zoom 10
> >> If you read the value, you will read the cached value
> >>
> >>> E.g.: the zoom control is at value 0 and I set it to 10, then I poll the zoom control
> >>> value: will that report the intermediate values until it reaches 10? And when it is
> >>> at 10, the control event is sent?
> >>>
> >>>>>>> Cc: stable@vger.kernel.org
> >>>>>>> Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
> >>>>>>> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> >>>>>>> ---
> >>>>>>>  drivers/media/usb/uvc/uvc_ctrl.c | 4 ++++
> >>>>>>>  1 file changed, 4 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> >>>>>>> index b6af4ff92cbd..3f8ae35cb3bc 100644
> >>>>>>> --- a/drivers/media/usb/uvc/uvc_ctrl.c
> >>>>>>> +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> >>>>>>> @@ -1955,6 +1955,10 @@ int uvc_ctrl_set(struct uvc_fh *handle,
> >>>>>>>       if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
> >>>>>>>               return -EACCES;
> >>>>>>>
> >>>>>>> +     /* Other file handle is waiting a response from this async control. */
> >>>>>>> +     if (ctrl->handle && ctrl->handle != handle)
> >>>>>>> +             return -EBUSY;
> >>>>>>> +
> >>>>>>>       /* Clamp out of range values. */
> >>>>>>>       switch (mapping->v4l2_type) {
> >>>>>>>       case V4L2_CTRL_TYPE_INTEGER:
> >
>


--
Ricardo Ribalda
Ricardo Ribalda Nov. 29, 2024, 1:41 p.m. UTC | #13
On Fri, 29 Nov 2024 at 14:13, Hans Verkuil <hverkuil-cisco@xs4all.nl> wrote:
>
> On 29/11/2024 12:54, Ricardo Ribalda wrote:
> > On Fri, 29 Nov 2024 at 12:06, Laurent Pinchart
> > <laurent.pinchart@ideasonboard.com> wrote:
> >>
> >> On Fri, Nov 29, 2024 at 11:59:27AM +0100, Ricardo Ribalda wrote:
> >>> On Fri, 29 Nov 2024 at 11:36, Hans Verkuil wrote:
> >>>> On 28/11/2024 23:33, Laurent Pinchart wrote:
> >>>>> On Thu, Nov 28, 2024 at 11:28:29PM +0100, Ricardo Ribalda wrote:
> >>>>>> On Thu, 28 Nov 2024 at 23:22, Laurent Pinchart wrote:
> >>>>>>>
> >>>>>>> Hi Ricardo,
> >>>>>>>
> >>>>>>> (CC'ing Hans Verkuil)
> >>>>>>>
> >>>>>>> Thank you for the patch.
> >>>>>>>
> >>>>>>> On Wed, Nov 27, 2024 at 12:14:50PM +0000, Ricardo Ribalda wrote:
> >>>>>>>> If a file handle is waiting for a response from an async control, avoid
> >>>>>>>> that other file handle operate with it.
> >>>>>>>>
> >>>>>>>> Without this patch, the first file handle will never get the event
> >>>>>>>> associated with that operation, which can lead to endless loops in
> >>>>>>>> applications. Eg:
> >>>>>>>> If an application A wants to change the zoom and to know when the
> >>>>>>>> operation has completed:
> >>>>>>>> it will open the video node, subscribe to the zoom event, change the
> >>>>>>>> control and wait for zoom to finish.
> >>>>>>>> If before the zoom operation finishes, another application B changes
> >>>>>>>> the zoom, the first app A will loop forever.
> >>>>>>>
> >>>>>>> Hans, the V4L2 specification isn't very clear on this. I see pros and
> >>>>>>> cons for both behaviours, with a preference for the current behaviour,
> >>>>>>> as with this patch the control will remain busy until the file handle is
> >>>>>>> closed if the device doesn't send the control event for any reason. What
> >>>>>>> do you think ?
> >>>>>>
> >>>>>> Just one small clarification. The same file handler can change the
> >>>>>> value of the async control as many times as it wants, even if the
> >>>>>> operation has not finished.
> >>>>>>
> >>>>>> It will be other file handles that will get -EBUSY if they try to use
> >>>>>> an async control with an unfinished operation started by another fh.
> >>>>>
> >>>>> Yes, I should have been more precised. If the device doesn't send the
> >>>>> control event, then all other file handles will be prevented from
> >>>>> setting the control until the file handle that set it first gets closed.
> >>>>
> >>>> I think I need a bit more background here:
> >>>>
> >>>> First of all, what is an asynchronous control in UVC? I think that means
> >>>> you can set it, but it takes time for that operation to finish, so you
> >>>> get an event later when the operation is done. So zoom and similar operations
> >>>> are examples of that.
> >>>>
> >>>> And only when the operation finishes will the control event be sent, correct?
> >>>
> >>> You are correct.  This diagrams from the spec is more or less clear:
> >>> https://ibb.co/MDGn7F3
> >>>
> >>>> While the operation is ongoing, if you query the control value, is that reporting
> >>>> the current position or the final position?
> >>>
> >>> I'd expect hardware will return either the current position, the start
> >>> position or the final position. I could not find anything in the spec
> >>> that points in one direction or the others.
> >>
> >> Figure 2-21 in UVC 1.5 indicates that the device should STALL the
> >> GET_CUR and SET_CUR requests if a state change is in progress.
> >>
> >>> And in the driver I believe that we might have a bug handling this
> >>> case (will send a patch if I can confirm it)
> >>> the zoom is at 0 and you set it 10
> >>> if you read the value 2 times before the camera reaches value 10:
> >>> - First value will come from the hardware and the response will be cached
> >>
> >> Only if the control doesn't have the auto-update flag set, so it will be
> >> device-dependent. As GET_CUR should stall that's not really relevant,
> >> except for the fact that devices may not stall the request.
> >
> > I missed that the device will likely stall during async operations.
> >
> > What do you think of something like this? I believe it can work with
> > compliant and non compliant devices.
> > Note that the event will be received by the device that originated the
> > operation, not to the second one that might receive an error during
> > write/read.
> >
> >
> >
> > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> > index 4fe26e82e3d1..9a86c912e7a2 100644
> > --- a/drivers/media/usb/uvc/uvc_ctrl.c
> > +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> > @@ -1826,14 +1826,15 @@ static int uvc_ctrl_commit_entity(struct
> > uvc_device *dev,
> >                         continue;
> >
> >                 /*
> > -                * Reset the loaded flag for auto-update controls that were
> > +                * Reset the loaded flag for auto-update controls and for
> > +                * asynchronous controls with pending operations, that were
> >                  * marked as loaded in uvc_ctrl_get/uvc_ctrl_set to prevent
> >                  * uvc_ctrl_get from using the cached value, and for write-only
> >                  * controls to prevent uvc_ctrl_set from setting bits not
> >                  * explicitly set by the user.
> >                  */
> >                 if (ctrl->info.flags & UVC_CTRL_FLAG_AUTO_UPDATE ||
> > -                   !(ctrl->info.flags & UVC_CTRL_FLAG_GET_CUR))
> > +                   !(ctrl->info.flags & UVC_CTRL_FLAG_GET_CUR) || ctrl->handle)
> >                         ctrl->loaded = 0;
> >
> >                 if (!ctrl->dirty)
> > @@ -2046,8 +2047,18 @@ int uvc_ctrl_set(struct uvc_fh *handle,
> >         mapping->set(mapping, value,
> >                 uvc_ctrl_data(ctrl, UVC_CTRL_DATA_CURRENT));
> >
> > -       if (ctrl->info.flags & UVC_CTRL_FLAG_ASYNCHRONOUS)
> > -               ctrl->handle = handle;
> > +       if (ctrl->info.flags & UVC_CTRL_FLAG_ASYNCHRONOUS) {
> > +               /*
> > +                * Other file handle is waiting for an operation on
> > +                * this asynchronous control. If the device is compliant
> > +                * this operation will fail.
> > +                *
> > +                * Do not replace the handle pointer, so the original file
> > +                * descriptor will get the completion event.
> > +                */
> > +               if (!ctrl->handle)
> > +                       ctrl->handle = handle;
>
> I don't think this is right: you want the completion event for async
> controls to go to all filehandles that are subscribed to that control.
>
> Which is what happens if handle == NULL (as I understand the code).
>
> Regards,

The code is correct, but the comment is not :). It should say:
 * Do not replace the handle pointer, or the originator of
 * the operation will receive an event.

The originar should NOT receive the event.
From uvc_ctrl_send_event():
/*
 * Send control change events to all subscribers for the @ctrl control. By
 * default the subscriber that generated the event, as identified by @handle,
 * is not notified unless it has set the V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK flag.
 * @handle can be NULL for asynchronous events related to auto-update controls,
 * in which case all subscribers are notified.
 */



>
>         Hans
>
> > +       }
> >
> >         ctrl->dirty = 1;
> >         ctrl->modified = 1;
> >
> >>
> >>> - Second value will be the cached one
> >>>
> >>> now the camera  is at zoom 10
> >>> If you read the value, you will read the cached value
> >>>
> >>>> E.g.: the zoom control is at value 0 and I set it to 10, then I poll the zoom control
> >>>> value: will that report the intermediate values until it reaches 10? And when it is
> >>>> at 10, the control event is sent?
> >>>>
> >>>>>>>> Cc: stable@vger.kernel.org
> >>>>>>>> Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
> >>>>>>>> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> >>>>>>>> ---
> >>>>>>>>  drivers/media/usb/uvc/uvc_ctrl.c | 4 ++++
> >>>>>>>>  1 file changed, 4 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> >>>>>>>> index b6af4ff92cbd..3f8ae35cb3bc 100644
> >>>>>>>> --- a/drivers/media/usb/uvc/uvc_ctrl.c
> >>>>>>>> +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> >>>>>>>> @@ -1955,6 +1955,10 @@ int uvc_ctrl_set(struct uvc_fh *handle,
> >>>>>>>>       if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
> >>>>>>>>               return -EACCES;
> >>>>>>>>
> >>>>>>>> +     /* Other file handle is waiting a response from this async control. */
> >>>>>>>> +     if (ctrl->handle && ctrl->handle != handle)
> >>>>>>>> +             return -EBUSY;
> >>>>>>>> +
> >>>>>>>>       /* Clamp out of range values. */
> >>>>>>>>       switch (mapping->v4l2_type) {
> >>>>>>>>       case V4L2_CTRL_TYPE_INTEGER:
> >>
> >> --
> >> Regards,
> >>
> >> Laurent Pinchart
> >
> >
> >
>
Ricardo Ribalda Nov. 29, 2024, 6:47 p.m. UTC | #14
Before we all go on a well deserved weekend, let me recap what we
know. If I did not get something correctly, let me know.

1) Well behaved devices do not allow to set or get an incomplete async
control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
2) Both Laurent and Ricardo consider that there is a big chance that
some camera modules do not implement this properly. (ref: years of
crying over broken module firmware :) )
3) ctrl->handle is designed to point to the fh that originated the
control. So the logic can decide if the originator needs to be
notified or not. (ref: uvc_ctrl_send_event() )
4) Right now we replace the originator in ctrl->handle for unfinished
async controls.  (ref:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)

My interpretation is that:
A) We need to change 4). We shall not change the originator of
unfinished ctrl->handle.
B) Well behaved cameras do not need the patch "Do not set an async
control owned by another fh"
C) For badly behaved cameras, it is fine if we slightly break the
v4l2-compliance in corner cases, if we do not break any internal data
structure.


I will send a new version with my interpretation.

Thanks for a great discussion
Laurent Pinchart Nov. 29, 2024, 9:47 p.m. UTC | #15
On Fri, Nov 29, 2024 at 02:37:21PM +0100, Ricardo Ribalda wrote:
> On Fri, 29 Nov 2024 at 14:10, Hans Verkuil wrote:
> > On 29/11/2024 12:06, Laurent Pinchart wrote:
> > > On Fri, Nov 29, 2024 at 11:59:27AM +0100, Ricardo Ribalda wrote:
> > >> On Fri, 29 Nov 2024 at 11:36, Hans Verkuil wrote:
> > >>> On 28/11/2024 23:33, Laurent Pinchart wrote:
> > >>>> On Thu, Nov 28, 2024 at 11:28:29PM +0100, Ricardo Ribalda wrote:
> > >>>>> On Thu, 28 Nov 2024 at 23:22, Laurent Pinchart wrote:
> > >>>>>>
> > >>>>>> Hi Ricardo,
> > >>>>>>
> > >>>>>> (CC'ing Hans Verkuil)
> > >>>>>>
> > >>>>>> Thank you for the patch.
> > >>>>>>
> > >>>>>> On Wed, Nov 27, 2024 at 12:14:50PM +0000, Ricardo Ribalda wrote:
> > >>>>>>> If a file handle is waiting for a response from an async control, avoid
> > >>>>>>> that other file handle operate with it.
> > >>>>>>>
> > >>>>>>> Without this patch, the first file handle will never get the event
> > >>>>>>> associated with that operation, which can lead to endless loops in
> > >>>>>>> applications. Eg:
> > >>>>>>> If an application A wants to change the zoom and to know when the
> > >>>>>>> operation has completed:
> > >>>>>>> it will open the video node, subscribe to the zoom event, change the
> > >>>>>>> control and wait for zoom to finish.
> > >>>>>>> If before the zoom operation finishes, another application B changes
> > >>>>>>> the zoom, the first app A will loop forever.
> > >>>>>>
> > >>>>>> Hans, the V4L2 specification isn't very clear on this. I see pros and
> > >>>>>> cons for both behaviours, with a preference for the current behaviour,
> > >>>>>> as with this patch the control will remain busy until the file handle is
> > >>>>>> closed if the device doesn't send the control event for any reason. What
> > >>>>>> do you think ?
> > >>>>>
> > >>>>> Just one small clarification. The same file handler can change the
> > >>>>> value of the async control as many times as it wants, even if the
> > >>>>> operation has not finished.
> > >>>>>
> > >>>>> It will be other file handles that will get -EBUSY if they try to use
> > >>>>> an async control with an unfinished operation started by another fh.
> > >>>>
> > >>>> Yes, I should have been more precised. If the device doesn't send the
> > >>>> control event, then all other file handles will be prevented from
> > >>>> setting the control until the file handle that set it first gets closed.
> > >>>
> > >>> I think I need a bit more background here:
> > >>>
> > >>> First of all, what is an asynchronous control in UVC? I think that means
> > >>> you can set it, but it takes time for that operation to finish, so you
> > >>> get an event later when the operation is done. So zoom and similar operations
> > >>> are examples of that.
> > >>>
> > >>> And only when the operation finishes will the control event be sent, correct?
> > >>
> > >> You are correct.  This diagrams from the spec is more or less clear:
> > >> https://ibb.co/MDGn7F3
> > >>
> > >>> While the operation is ongoing, if you query the control value, is that reporting
> > >>> the current position or the final position?
> > >>
> > >> I'd expect hardware will return either the current position, the start
> > >> position or the final position. I could not find anything in the spec
> > >> that points in one direction or the others.
> > >
> > > Figure 2-21 in UVC 1.5 indicates that the device should STALL the
> > > GET_CUR and SET_CUR requests if a state change is in progress.
> > >
> > >> And in the driver I believe that we might have a bug handling this
> > >> case (will send a patch if I can confirm it)
> > >> the zoom is at 0 and you set it 10
> > >> if you read the value 2 times before the camera reaches value 10:
> > >> - First value will come from the hardware and the response will be cached
> > >
> > > Only if the control doesn't have the auto-update flag set, so it will be
> > > device-dependent. As GET_CUR should stall that's not really relevant,
> > > except for the fact that devices may not stall the request.
> >
> > OK, that helps a lot.
> >
> > If an operation is in progress, then setting a new control value should
> > result in -EBUSY. Based on the description above, I gather that even the
> > same fh that made the request cannot update it while the operation is
> > ongoing?
> 
> That is correct according to the spec. But both Laurent (correct me if
> I am wrong) and me suspect that there are devices that do not
> implement this properly.

With UVC it's more than suspicion. Whatever the feature, there's bound
to be a device that gets it wrong. The more niche the feature is, the
worse it can get.

> > Getting the control should just return the value that was set. I assume
> > that is cached in uvc?
> 
> If I get the code right... we only return the cached value when the
> field "loaded" is true. That happens when we read the device. So if
> the driver is just loaded
> 
> write(full controlA)
> read(full controlA)
> read(full controlA)
> 
> The first read will get the value from the hardware, the second will be cached.

It depends if the control is marked as auto-update or not. This is
orthogonal to asynchronous controls, so we shouldn't rely on it.

> > >> - Second value will be the cached one
> > >>
> > >> now the camera  is at zoom 10
> > >> If you read the value, you will read the cached value
> > >>
> > >>> E.g.: the zoom control is at value 0 and I set it to 10, then I poll the zoom control
> > >>> value: will that report the intermediate values until it reaches 10? And when it is
> > >>> at 10, the control event is sent?
> > >>>
> > >>>>>>> Cc: stable@vger.kernel.org
> > >>>>>>> Fixes: e5225c820c05 ("media: uvcvideo: Send a control event when a Control Change interrupt arrives")
> > >>>>>>> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> > >>>>>>> ---
> > >>>>>>>  drivers/media/usb/uvc/uvc_ctrl.c | 4 ++++
> > >>>>>>>  1 file changed, 4 insertions(+)
> > >>>>>>>
> > >>>>>>> diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> > >>>>>>> index b6af4ff92cbd..3f8ae35cb3bc 100644
> > >>>>>>> --- a/drivers/media/usb/uvc/uvc_ctrl.c
> > >>>>>>> +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> > >>>>>>> @@ -1955,6 +1955,10 @@ int uvc_ctrl_set(struct uvc_fh *handle,
> > >>>>>>>       if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
> > >>>>>>>               return -EACCES;
> > >>>>>>>
> > >>>>>>> +     /* Other file handle is waiting a response from this async control. */
> > >>>>>>> +     if (ctrl->handle && ctrl->handle != handle)
> > >>>>>>> +             return -EBUSY;
> > >>>>>>> +
> > >>>>>>>       /* Clamp out of range values. */
> > >>>>>>>       switch (mapping->v4l2_type) {
> > >>>>>>>       case V4L2_CTRL_TYPE_INTEGER:
Laurent Pinchart Nov. 29, 2024, 10:03 p.m. UTC | #16
On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
> Before we all go on a well deserved weekend, let me recap what we
> know. If I did not get something correctly, let me know.
> 
> 1) Well behaved devices do not allow to set or get an incomplete async
> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
> 2) Both Laurent and Ricardo consider that there is a big chance that
> some camera modules do not implement this properly. (ref: years of
> crying over broken module firmware :) )
>
> 3) ctrl->handle is designed to point to the fh that originated the
> control. So the logic can decide if the originator needs to be
> notified or not. (ref: uvc_ctrl_send_event() )
> 4) Right now we replace the originator in ctrl->handle for unfinished
> async controls.  (ref:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
> 
> My interpretation is that:
> A) We need to change 4). We shall not change the originator of
> unfinished ctrl->handle.
> B) Well behaved cameras do not need the patch "Do not set an async
> control owned by another fh"
> C) For badly behaved cameras, it is fine if we slightly break the
> v4l2-compliance in corner cases, if we do not break any internal data
> structure.

The fact that some devices may not implement the documented behaviour
correctly may not be a problem. Well-behaved devices will stall, which
means we shouldn't query the device while as async update is in
progress. Badly-behaved devices, whatever they do when queried, should
not cause any issue if we don't query them.

We should not send GET_CUR and SET_CUR requests to the device while an
async update is in progress, and use cached values instead. When we
receive the async update event, we should clear the cache. This will be
the same for both well-behaved and badly-behaved devices, so we can
expose the same behaviour towards userspace.

We possibly also need some kind of timeout mechanism to cope with the
async update event not being delivered by the device.

Regarding the userspace behaviour during an auto-update, we have
multiple options:

For control get,

- We can return -EBUSY
- We can return the old value from the cache
- We can return the new value fromt he cache

Returning -EBUSY would be simpler to implement.

I don't think the behaviour should depend on whether the control is read
on the file handle that initiated the async operation or on a different
file handle.

For control set, I don't think we can do much else than returning
-EBUSY, regardless of which file handle the control is set on.

> I will send a new version with my interpretation.
> 
> Thanks for a great discussion
Ricardo Ribalda Nov. 29, 2024, 10:18 p.m. UTC | #17
On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
> > Before we all go on a well deserved weekend, let me recap what we
> > know. If I did not get something correctly, let me know.
> >
> > 1) Well behaved devices do not allow to set or get an incomplete async
> > control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
> > 2) Both Laurent and Ricardo consider that there is a big chance that
> > some camera modules do not implement this properly. (ref: years of
> > crying over broken module firmware :) )
> >
> > 3) ctrl->handle is designed to point to the fh that originated the
> > control. So the logic can decide if the originator needs to be
> > notified or not. (ref: uvc_ctrl_send_event() )
> > 4) Right now we replace the originator in ctrl->handle for unfinished
> > async controls.  (ref:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
> >
> > My interpretation is that:
> > A) We need to change 4). We shall not change the originator of
> > unfinished ctrl->handle.
> > B) Well behaved cameras do not need the patch "Do not set an async
> > control owned by another fh"
> > C) For badly behaved cameras, it is fine if we slightly break the
> > v4l2-compliance in corner cases, if we do not break any internal data
> > structure.
>
> The fact that some devices may not implement the documented behaviour
> correctly may not be a problem. Well-behaved devices will stall, which
> means we shouldn't query the device while as async update is in
> progress. Badly-behaved devices, whatever they do when queried, should
> not cause any issue if we don't query them.

I thought we could detect the stall and return safely. Isn't that the case?
Why we have not seen issues with this?


>
> We should not send GET_CUR and SET_CUR requests to the device while an
> async update is in progress, and use cached values instead. When we
> receive the async update event, we should clear the cache. This will be
> the same for both well-behaved and badly-behaved devices, so we can
> expose the same behaviour towards userspace.

seting ctrl->loaded = 0 when we get an event sounds like a good idea
and something we can implement right away.
If I have to resend the set I will add it to the end.

>
> We possibly also need some kind of timeout mechanism to cope with the
> async update event not being delivered by the device.

This is the part that worries me the most:
- timeouts make the code fragile
- What is a good value for timeout? 1 second, 30, 300? I do not think
that we can find a value.


>
> Regarding the userspace behaviour during an auto-update, we have
> multiple options:
>
> For control get,
>
> - We can return -EBUSY
> - We can return the old value from the cache
> - We can return the new value fromt he cache
>
> Returning -EBUSY would be simpler to implement.
Not only easy, I think it is the most correct,

>
> I don't think the behaviour should depend on whether the control is read
> on the file handle that initiated the async operation or on a different
> file handle.
>
> For control set, I don't think we can do much else than returning
> -EBUSY, regardless of which file handle the control is set on.

ACK.

>
> > I will send a new version with my interpretation.
> >
> > Thanks for a great discussion
>
> --
> Regards,
>
> Laurent Pinchart

Looking with some perspective... I believe that we should look into
the "userspace behaviour for auto controls" in a different patchset.
It is slightly unrelated to this discussion.
Laurent Pinchart Dec. 2, 2024, 12:18 a.m. UTC | #18
On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
> > On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
> > > Before we all go on a well deserved weekend, let me recap what we
> > > know. If I did not get something correctly, let me know.
> > >
> > > 1) Well behaved devices do not allow to set or get an incomplete async
> > > control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
> > > 2) Both Laurent and Ricardo consider that there is a big chance that
> > > some camera modules do not implement this properly. (ref: years of
> > > crying over broken module firmware :) )
> > >
> > > 3) ctrl->handle is designed to point to the fh that originated the
> > > control. So the logic can decide if the originator needs to be
> > > notified or not. (ref: uvc_ctrl_send_event() )
> > > 4) Right now we replace the originator in ctrl->handle for unfinished
> > > async controls.  (ref:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
> > >
> > > My interpretation is that:
> > > A) We need to change 4). We shall not change the originator of
> > > unfinished ctrl->handle.
> > > B) Well behaved cameras do not need the patch "Do not set an async
> > > control owned by another fh"
> > > C) For badly behaved cameras, it is fine if we slightly break the
> > > v4l2-compliance in corner cases, if we do not break any internal data
> > > structure.
> >
> > The fact that some devices may not implement the documented behaviour
> > correctly may not be a problem. Well-behaved devices will stall, which
> > means we shouldn't query the device while as async update is in
> > progress. Badly-behaved devices, whatever they do when queried, should
> > not cause any issue if we don't query them.
> 
> I thought we could detect the stall and return safely. Isn't that the case?

We could, but if we know the device will stall anyway, is there a reason
not to avoid issuing the request in the first place ?

> Why we have not seen issues with this?

I haven't tested a PTZ device for a very long time, and you would need
to hit a small time window to see the issue.

> > We should not send GET_CUR and SET_CUR requests to the device while an
> > async update is in progress, and use cached values instead. When we
> > receive the async update event, we should clear the cache. This will be
> > the same for both well-behaved and badly-behaved devices, so we can
> > expose the same behaviour towards userspace.
> 
> seting ctrl->loaded = 0 when we get an event sounds like a good idea
> and something we can implement right away.
> If I have to resend the set I will add it to the end.
> 
> > We possibly also need some kind of timeout mechanism to cope with the
> > async update event not being delivered by the device.
> 
> This is the part that worries me the most:
> - timeouts make the code fragile
> - What is a good value for timeout? 1 second, 30, 300? I do not think
> that we can find a value.

I've been thinking about the implementation of uvc_fh cleanup over the
weekend, and having a timeout would have the nice advantage that we
could reference-count uvc_fh instead of implementing a cleanup that
walks over all controls when closing a file handle. I think it would
make the code simpler, and possibly safer too.

> > Regarding the userspace behaviour during an auto-update, we have
> > multiple options:
> >
> > For control get,
> >
> > - We can return -EBUSY
> > - We can return the old value from the cache
> > - We can return the new value fromt he cache
> >
> > Returning -EBUSY would be simpler to implement.
>
> Not only easy, I think it is the most correct,
> 
> > I don't think the behaviour should depend on whether the control is read
> > on the file handle that initiated the async operation or on a different
> > file handle.
> >
> > For control set, I don't think we can do much else than returning
> > -EBUSY, regardless of which file handle the control is set on.
> 
> ACK.
> 
> > > I will send a new version with my interpretation.
> > >
> > > Thanks for a great discussion
> 
> Looking with some perspective... I believe that we should look into
> the "userspace behaviour for auto controls" in a different patchset.
> It is slightly unrelated to this discussion.
Ricardo Ribalda Dec. 2, 2024, 7:28 a.m. UTC | #19
On Mon, 2 Dec 2024 at 01:19, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
> > On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
> > > On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
> > > > Before we all go on a well deserved weekend, let me recap what we
> > > > know. If I did not get something correctly, let me know.
> > > >
> > > > 1) Well behaved devices do not allow to set or get an incomplete async
> > > > control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
> > > > 2) Both Laurent and Ricardo consider that there is a big chance that
> > > > some camera modules do not implement this properly. (ref: years of
> > > > crying over broken module firmware :) )
> > > >
> > > > 3) ctrl->handle is designed to point to the fh that originated the
> > > > control. So the logic can decide if the originator needs to be
> > > > notified or not. (ref: uvc_ctrl_send_event() )
> > > > 4) Right now we replace the originator in ctrl->handle for unfinished
> > > > async controls.  (ref:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
> > > >
> > > > My interpretation is that:
> > > > A) We need to change 4). We shall not change the originator of
> > > > unfinished ctrl->handle.
> > > > B) Well behaved cameras do not need the patch "Do not set an async
> > > > control owned by another fh"
> > > > C) For badly behaved cameras, it is fine if we slightly break the
> > > > v4l2-compliance in corner cases, if we do not break any internal data
> > > > structure.
> > >
> > > The fact that some devices may not implement the documented behaviour
> > > correctly may not be a problem. Well-behaved devices will stall, which
> > > means we shouldn't query the device while as async update is in
> > > progress. Badly-behaved devices, whatever they do when queried, should
> > > not cause any issue if we don't query them.
> >
> > I thought we could detect the stall and return safely. Isn't that the case?
>
> We could, but if we know the device will stall anyway, is there a reason
> not to avoid issuing the request in the first place ?

Because there are always going to be devices that do not send the
event when the control has finished.

It is impossible to know the state of the device: Is the zoom still
moving or the device is not compliant?


>
> > Why we have not seen issues with this?
>
> I haven't tested a PTZ device for a very long time, and you would need
> to hit a small time window to see the issue.

Not that small, some devices take up to 10 seconds to go from the
smallest zoom to the biggest zoom.
I'd say that v4l2-ctl has a very big chance to hit any error.

BTW, homing can take an even longer time.

>
> > > We should not send GET_CUR and SET_CUR requests to the device while an
> > > async update is in progress, and use cached values instead. When we
> > > receive the async update event, we should clear the cache. This will be
> > > the same for both well-behaved and badly-behaved devices, so we can
> > > expose the same behaviour towards userspace.
> >
> > seting ctrl->loaded = 0 when we get an event sounds like a good idea
> > and something we can implement right away.
> > If I have to resend the set I will add it to the end.
> >
> > > We possibly also need some kind of timeout mechanism to cope with the
> > > async update event not being delivered by the device.
> >
> > This is the part that worries me the most:
> > - timeouts make the code fragile
> > - What is a good value for timeout? 1 second, 30, 300? I do not think
> > that we can find a value.
>
> I've been thinking about the implementation of uvc_fh cleanup over the
> weekend, and having a timeout would have the nice advantage that we
> could reference-count uvc_fh instead of implementing a cleanup that
> walks over all controls when closing a file handle. I think it would
> make the code simpler, and possibly safer too.

The problem with a timeout is:
- We do not know what is the right value for the timeout.
- We need to have one timeout per control, or implement a timeout
dispatcher mechanism, and that is racy by nature
- It will require introducing a new locking mechanism to avoid races
between the timeout handler and the event completion
- It introduces a new lifetime in the driver... I'd argue that it is
best to have less, not more.

I do not see many benefits....

What we can introduce on top of my set is something like this (just
pseudocode, do not scream at me :P)  That code can prevent stalls and
will work with misbehaving hardware.... (But I still do not know a
good value for timeout) and solve some of the issues that I mention.

diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
index f0e8a436a306..a55bd4b3ac07 100644
--- a/drivers/media/usb/uvc/uvc_ctrl.c
+++ b/drivers/media/usb/uvc/uvc_ctrl.c
@@ -1132,6 +1132,9 @@ static int __uvc_ctrl_get(struct uvc_video_chain *chain,
        if ((ctrl->info.flags & UVC_CTRL_FLAG_GET_CUR) == 0)
                return -EACCES;

+       if (ctrl->handle && ( NOW - ctrl->async_start_time ) < TIMEOUT)
+               return -EBUSY;
+
        ret = __uvc_ctrl_load_cur(chain, ctrl);
        if (ret < 0)
                return ret;
@@ -1591,6 +1594,7 @@ static int uvc_ctrl_get_handle(struct uvc_fh
*handle, struct uvc_control *ctrl)
                return -EBUSY;

        ctrl->handle = handle;
+       ctrl->async_start_time = NOW;
        handle->pending_async_ctrls++;
        return 0;
 }
@@ -1982,6 +1986,9 @@ int uvc_ctrl_set(struct uvc_fh *handle,
        if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
                return -EACCES;

+       if (ctrl->handle && ( NOW - ctrl->async_start_time ) < TIMEOUT)
+               return -EBUSY;
+
        /* Clamp out of range values. */
        switch (mapping->v4l2_type) {
        case V4L2_CTRL_TYPE_INTEGER:
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index e0e4f099a210..5c82fae94dff 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -154,6 +154,8 @@ struct uvc_control {
                                 * File handle that initially changed the
                                 * async control.
                                 */
+
+       TIME async_start_time;
 };


In any case. Can we solve this in incremental steps? I think the
current patch seems simple and correct. We can introduce the timeout
later.


Thanks!

>
> > > Regarding the userspace behaviour during an auto-update, we have
> > > multiple options:
> > >
> > > For control get,
> > >
> > > - We can return -EBUSY
> > > - We can return the old value from the cache
> > > - We can return the new value fromt he cache
> > >
> > > Returning -EBUSY would be simpler to implement.
> >
> > Not only easy, I think it is the most correct,
> >
> > > I don't think the behaviour should depend on whether the control is read
> > > on the file handle that initiated the async operation or on a different
> > > file handle.
> > >
> > > For control set, I don't think we can do much else than returning
> > > -EBUSY, regardless of which file handle the control is set on.
> >
> > ACK.
> >
> > > > I will send a new version with my interpretation.
> > > >
> > > > Thanks for a great discussion
> >
> > Looking with some perspective... I believe that we should look into
> > the "userspace behaviour for auto controls" in a different patchset.
> > It is slightly unrelated to this discussion.
>
> --
> Regards,
>
> Laurent Pinchart
Hans Verkuil Dec. 2, 2024, 8:05 a.m. UTC | #20
On 02/12/2024 01:18, Laurent Pinchart wrote:
> On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
>> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
>>> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
>>>> Before we all go on a well deserved weekend, let me recap what we
>>>> know. If I did not get something correctly, let me know.
>>>>
>>>> 1) Well behaved devices do not allow to set or get an incomplete async
>>>> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
>>>> 2) Both Laurent and Ricardo consider that there is a big chance that
>>>> some camera modules do not implement this properly. (ref: years of
>>>> crying over broken module firmware :) )
>>>>
>>>> 3) ctrl->handle is designed to point to the fh that originated the
>>>> control. So the logic can decide if the originator needs to be
>>>> notified or not. (ref: uvc_ctrl_send_event() )
>>>> 4) Right now we replace the originator in ctrl->handle for unfinished
>>>> async controls.  (ref:
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
>>>>
>>>> My interpretation is that:
>>>> A) We need to change 4). We shall not change the originator of
>>>> unfinished ctrl->handle.
>>>> B) Well behaved cameras do not need the patch "Do not set an async
>>>> control owned by another fh"
>>>> C) For badly behaved cameras, it is fine if we slightly break the
>>>> v4l2-compliance in corner cases, if we do not break any internal data
>>>> structure.
>>>
>>> The fact that some devices may not implement the documented behaviour
>>> correctly may not be a problem. Well-behaved devices will stall, which
>>> means we shouldn't query the device while as async update is in
>>> progress. Badly-behaved devices, whatever they do when queried, should
>>> not cause any issue if we don't query them.
>>
>> I thought we could detect the stall and return safely. Isn't that the case?
> 
> We could, but if we know the device will stall anyway, is there a reason
> not to avoid issuing the request in the first place ?
> 
>> Why we have not seen issues with this?
> 
> I haven't tested a PTZ device for a very long time, and you would need
> to hit a small time window to see the issue.
> 
>>> We should not send GET_CUR and SET_CUR requests to the device while an
>>> async update is in progress, and use cached values instead. When we
>>> receive the async update event, we should clear the cache. This will be
>>> the same for both well-behaved and badly-behaved devices, so we can
>>> expose the same behaviour towards userspace.
>>
>> seting ctrl->loaded = 0 when we get an event sounds like a good idea
>> and something we can implement right away.
>> If I have to resend the set I will add it to the end.
>>
>>> We possibly also need some kind of timeout mechanism to cope with the
>>> async update event not being delivered by the device.
>>
>> This is the part that worries me the most:
>> - timeouts make the code fragile
>> - What is a good value for timeout? 1 second, 30, 300? I do not think
>> that we can find a value.
> 
> I've been thinking about the implementation of uvc_fh cleanup over the
> weekend, and having a timeout would have the nice advantage that we
> could reference-count uvc_fh instead of implementing a cleanup that
> walks over all controls when closing a file handle. I think it would
> make the code simpler, and possibly safer too.
> 
>>> Regarding the userspace behaviour during an auto-update, we have
>>> multiple options:
>>>
>>> For control get,
>>>
>>> - We can return -EBUSY
>>> - We can return the old value from the cache

This would match the control behavior best. Only when the operation is
done is the control updated and the control event sent.

Some questions: is any of this documented for UVC? Because this is non-standard
behavior. Are there applications that rely on this? Should we perhaps add
proper support for this to the control framework? E.g. add an ASYNC flag and
document this?

Regards,

	Hans

>>> - We can return the new value fromt he cache
>>>
>>> Returning -EBUSY would be simpler to implement.
>>
>> Not only easy, I think it is the most correct,
>>
>>> I don't think the behaviour should depend on whether the control is read
>>> on the file handle that initiated the async operation or on a different
>>> file handle.
>>>
>>> For control set, I don't think we can do much else than returning
>>> -EBUSY, regardless of which file handle the control is set on.
>>
>> ACK.
>>
>>>> I will send a new version with my interpretation.
>>>>
>>>> Thanks for a great discussion
>>
>> Looking with some perspective... I believe that we should look into
>> the "userspace behaviour for auto controls" in a different patchset.
>> It is slightly unrelated to this discussion.
>
Laurent Pinchart Dec. 2, 2024, 8:11 a.m. UTC | #21
On Mon, Dec 02, 2024 at 09:05:07AM +0100, Hans Verkuil wrote:
> On 02/12/2024 01:18, Laurent Pinchart wrote:
> > On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
> >> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
> >>> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
> >>>> Before we all go on a well deserved weekend, let me recap what we
> >>>> know. If I did not get something correctly, let me know.
> >>>>
> >>>> 1) Well behaved devices do not allow to set or get an incomplete async
> >>>> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
> >>>> 2) Both Laurent and Ricardo consider that there is a big chance that
> >>>> some camera modules do not implement this properly. (ref: years of
> >>>> crying over broken module firmware :) )
> >>>>
> >>>> 3) ctrl->handle is designed to point to the fh that originated the
> >>>> control. So the logic can decide if the originator needs to be
> >>>> notified or not. (ref: uvc_ctrl_send_event() )
> >>>> 4) Right now we replace the originator in ctrl->handle for unfinished
> >>>> async controls.  (ref:
> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
> >>>>
> >>>> My interpretation is that:
> >>>> A) We need to change 4). We shall not change the originator of
> >>>> unfinished ctrl->handle.
> >>>> B) Well behaved cameras do not need the patch "Do not set an async
> >>>> control owned by another fh"
> >>>> C) For badly behaved cameras, it is fine if we slightly break the
> >>>> v4l2-compliance in corner cases, if we do not break any internal data
> >>>> structure.
> >>>
> >>> The fact that some devices may not implement the documented behaviour
> >>> correctly may not be a problem. Well-behaved devices will stall, which
> >>> means we shouldn't query the device while as async update is in
> >>> progress. Badly-behaved devices, whatever they do when queried, should
> >>> not cause any issue if we don't query them.
> >>
> >> I thought we could detect the stall and return safely. Isn't that the case?
> > 
> > We could, but if we know the device will stall anyway, is there a reason
> > not to avoid issuing the request in the first place ?
> > 
> >> Why we have not seen issues with this?
> > 
> > I haven't tested a PTZ device for a very long time, and you would need
> > to hit a small time window to see the issue.
> > 
> >>> We should not send GET_CUR and SET_CUR requests to the device while an
> >>> async update is in progress, and use cached values instead. When we
> >>> receive the async update event, we should clear the cache. This will be
> >>> the same for both well-behaved and badly-behaved devices, so we can
> >>> expose the same behaviour towards userspace.
> >>
> >> seting ctrl->loaded = 0 when we get an event sounds like a good idea
> >> and something we can implement right away.
> >> If I have to resend the set I will add it to the end.
> >>
> >>> We possibly also need some kind of timeout mechanism to cope with the
> >>> async update event not being delivered by the device.
> >>
> >> This is the part that worries me the most:
> >> - timeouts make the code fragile
> >> - What is a good value for timeout? 1 second, 30, 300? I do not think
> >> that we can find a value.
> > 
> > I've been thinking about the implementation of uvc_fh cleanup over the
> > weekend, and having a timeout would have the nice advantage that we
> > could reference-count uvc_fh instead of implementing a cleanup that
> > walks over all controls when closing a file handle. I think it would
> > make the code simpler, and possibly safer too.
> > 
> >>> Regarding the userspace behaviour during an auto-update, we have
> >>> multiple options:
> >>>
> >>> For control get,
> >>>
> >>> - We can return -EBUSY
> >>> - We can return the old value from the cache
> 
> This would match the control behavior best. Only when the operation is
> done is the control updated and the control event sent.
> 
> Some questions: is any of this documented for UVC? Because this is non-standard

No this isn't documented.

> behavior. Are there applications that rely on this? Should we perhaps add

I don't know.

> proper support for this to the control framework? E.g. add an ASYNC flag and
> document this?

We could, but this is such a specific use case that I don't think is
worth adding complexity to the already complex control framework would
be worth it. What we could do is perhaps adding a flag for the userspace
API, but even there, I never like modelling an API with a single user.

> >>> - We can return the new value fromt he cache
> >>>
> >>> Returning -EBUSY would be simpler to implement.
> >>
> >> Not only easy, I think it is the most correct,
> >>
> >>> I don't think the behaviour should depend on whether the control is read
> >>> on the file handle that initiated the async operation or on a different
> >>> file handle.
> >>>
> >>> For control set, I don't think we can do much else than returning
> >>> -EBUSY, regardless of which file handle the control is set on.
> >>
> >> ACK.
> >>
> >>>> I will send a new version with my interpretation.
> >>>>
> >>>> Thanks for a great discussion
> >>
> >> Looking with some perspective... I believe that we should look into
> >> the "userspace behaviour for auto controls" in a different patchset.
> >> It is slightly unrelated to this discussion.
Laurent Pinchart Dec. 2, 2024, 8:38 a.m. UTC | #22
On Mon, Dec 02, 2024 at 08:28:48AM +0100, Ricardo Ribalda wrote:
> On Mon, 2 Dec 2024 at 01:19, Laurent Pinchart
> <laurent.pinchart@ideasonboard.com> wrote:
> >
> > On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
> > > On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
> > > > On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
> > > > > Before we all go on a well deserved weekend, let me recap what we
> > > > > know. If I did not get something correctly, let me know.
> > > > >
> > > > > 1) Well behaved devices do not allow to set or get an incomplete async
> > > > > control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
> > > > > 2) Both Laurent and Ricardo consider that there is a big chance that
> > > > > some camera modules do not implement this properly. (ref: years of
> > > > > crying over broken module firmware :) )
> > > > >
> > > > > 3) ctrl->handle is designed to point to the fh that originated the
> > > > > control. So the logic can decide if the originator needs to be
> > > > > notified or not. (ref: uvc_ctrl_send_event() )
> > > > > 4) Right now we replace the originator in ctrl->handle for unfinished
> > > > > async controls.  (ref:
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
> > > > >
> > > > > My interpretation is that:
> > > > > A) We need to change 4). We shall not change the originator of
> > > > > unfinished ctrl->handle.
> > > > > B) Well behaved cameras do not need the patch "Do not set an async
> > > > > control owned by another fh"
> > > > > C) For badly behaved cameras, it is fine if we slightly break the
> > > > > v4l2-compliance in corner cases, if we do not break any internal data
> > > > > structure.
> > > >
> > > > The fact that some devices may not implement the documented behaviour
> > > > correctly may not be a problem. Well-behaved devices will stall, which
> > > > means we shouldn't query the device while as async update is in
> > > > progress. Badly-behaved devices, whatever they do when queried, should
> > > > not cause any issue if we don't query them.
> > >
> > > I thought we could detect the stall and return safely. Isn't that the case?
> >
> > We could, but if we know the device will stall anyway, is there a reason
> > not to avoid issuing the request in the first place ?
> 
> Because there are always going to be devices that do not send the
> event when the control has finished.
> 
> It is impossible to know the state of the device: Is the zoom still
> moving or the device is not compliant?

Another type of broken behaviour would be devices that don't correctly
handle UVC_GET and UVC_CUR requests while an async update is in
progress. If you issue those requests, those devices will break. Are
they more or less important than devices that don't send an event ?

All of those are partly theoretical problems. I'd rather keep it simple
for now until we get feedback about broken devices.

> > > Why we have not seen issues with this?
> >
> > I haven't tested a PTZ device for a very long time, and you would need
> > to hit a small time window to see the issue.
> 
> Not that small, some devices take up to 10 seconds to go from the
> smallest zoom to the biggest zoom.
> I'd say that v4l2-ctl has a very big chance to hit any error.
> 
> BTW, homing can take an even longer time.
> 
> > > > We should not send GET_CUR and SET_CUR requests to the device while an
> > > > async update is in progress, and use cached values instead. When we
> > > > receive the async update event, we should clear the cache. This will be
> > > > the same for both well-behaved and badly-behaved devices, so we can
> > > > expose the same behaviour towards userspace.
> > >
> > > seting ctrl->loaded = 0 when we get an event sounds like a good idea
> > > and something we can implement right away.
> > > If I have to resend the set I will add it to the end.
> > >
> > > > We possibly also need some kind of timeout mechanism to cope with the
> > > > async update event not being delivered by the device.
> > >
> > > This is the part that worries me the most:
> > > - timeouts make the code fragile
> > > - What is a good value for timeout? 1 second, 30, 300? I do not think
> > > that we can find a value.
> >
> > I've been thinking about the implementation of uvc_fh cleanup over the
> > weekend, and having a timeout would have the nice advantage that we
> > could reference-count uvc_fh instead of implementing a cleanup that
> > walks over all controls when closing a file handle. I think it would
> > make the code simpler, and possibly safer too.
> 
> The problem with a timeout is:
> - We do not know what is the right value for the timeout.
> - We need to have one timeout per control, or implement a timeout
> dispatcher mechanism, and that is racy by nature
> - It will require introducing a new locking mechanism to avoid races
> between the timeout handler and the event completion

Timeouts don't come for free, that's for sure.

> - It introduces a new lifetime in the driver... I'd argue that it is
> best to have less, not more.

That I disagree with, my experience with V4L2 (as well as with DRM, or
actually the kernel in general) is that we would be in a much better
place today if object lifetimes were handled with reference counting
more widely.

> I do not see many benefits....

The main benefit is simplifying the close() implementation and
complexity, as well as lowering lock contention (whether the latter
brings a real advantage in practice, I haven't checked)..

> What we can introduce on top of my set is something like this (just
> pseudocode, do not scream at me :P)  That code can prevent stalls and
> will work with misbehaving hardware.... (But I still do not know a
> good value for timeout) and solve some of the issues that I mention.
> 
> diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> index f0e8a436a306..a55bd4b3ac07 100644
> --- a/drivers/media/usb/uvc/uvc_ctrl.c
> +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> @@ -1132,6 +1132,9 @@ static int __uvc_ctrl_get(struct uvc_video_chain *chain,
>         if ((ctrl->info.flags & UVC_CTRL_FLAG_GET_CUR) == 0)
>                 return -EACCES;
> 
> +       if (ctrl->handle && ( NOW - ctrl->async_start_time ) < TIMEOUT)
> +               return -EBUSY;
> +
>         ret = __uvc_ctrl_load_cur(chain, ctrl);
>         if (ret < 0)
>                 return ret;
> @@ -1591,6 +1594,7 @@ static int uvc_ctrl_get_handle(struct uvc_fh
> *handle, struct uvc_control *ctrl)
>                 return -EBUSY;
> 
>         ctrl->handle = handle;
> +       ctrl->async_start_time = NOW;
>         handle->pending_async_ctrls++;
>         return 0;
>  }
> @@ -1982,6 +1986,9 @@ int uvc_ctrl_set(struct uvc_fh *handle,
>         if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
>                 return -EACCES;
> 
> +       if (ctrl->handle && ( NOW - ctrl->async_start_time ) < TIMEOUT)
> +               return -EBUSY;

If I understand you correctly, this is meant to prevent accessing the
device for an initial TIMEOUT period, based on the knowledge it will
very likely STALL. After the TIMEOUT, if the async set is still not
complete, SET_CUR will be issued, and a STALL will likely occur.

If we device to standardize on -EBUSY while the async set is in
progress, we need to do so before and after the timeout.
uvc_query_ctrl() will return -EBUSY in case of a STALL if the device
reports the "Not ready" error. This is what I expect a device to report
in this case. Of course the UVC specification, in section 2.4.4,
documents that the device will update the request error code control,
but doesn't tell which value should be used :-S I suppose we'll handle
broken devices if the need arise.

> +
>         /* Clamp out of range values. */
>         switch (mapping->v4l2_type) {
>         case V4L2_CTRL_TYPE_INTEGER:
> diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
> index e0e4f099a210..5c82fae94dff 100644
> --- a/drivers/media/usb/uvc/uvcvideo.h
> +++ b/drivers/media/usb/uvc/uvcvideo.h
> @@ -154,6 +154,8 @@ struct uvc_control {
>                                  * File handle that initially changed the
>                                  * async control.
>                                  */
> +
> +       TIME async_start_time;
>  };
> 
> 
> In any case. Can we solve this in incremental steps? I think the
> current patch seems simple and correct. We can introduce the timeout
> later.

The main reason for a timeout was to replace walking over all controls
at close() time with reference counting of uvc_fh, which is an
alternative approach to the current patch. Given that picking a good
timeout value is difficult, and that the current implementation already
skips the walk in the most common case when no async set is pending, I
suppose we could start with that.

> > > > Regarding the userspace behaviour during an auto-update, we have
> > > > multiple options:
> > > >
> > > > For control get,
> > > >
> > > > - We can return -EBUSY
> > > > - We can return the old value from the cache
> > > > - We can return the new value fromt he cache
> > > >
> > > > Returning -EBUSY would be simpler to implement.
> > >
> > > Not only easy, I think it is the most correct,
> > >
> > > > I don't think the behaviour should depend on whether the control is read
> > > > on the file handle that initiated the async operation or on a different
> > > > file handle.
> > > >
> > > > For control set, I don't think we can do much else than returning
> > > > -EBUSY, regardless of which file handle the control is set on.
> > >
> > > ACK.
> > >
> > > > > I will send a new version with my interpretation.
> > > > >
> > > > > Thanks for a great discussion
> > >
> > > Looking with some perspective... I believe that we should look into
> > > the "userspace behaviour for auto controls" in a different patchset.
> > > It is slightly unrelated to this discussion.
Hans Verkuil Dec. 2, 2024, 8:44 a.m. UTC | #23
On 02/12/2024 09:11, Laurent Pinchart wrote:
> On Mon, Dec 02, 2024 at 09:05:07AM +0100, Hans Verkuil wrote:
>> On 02/12/2024 01:18, Laurent Pinchart wrote:
>>> On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
>>>> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
>>>>> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
>>>>>> Before we all go on a well deserved weekend, let me recap what we
>>>>>> know. If I did not get something correctly, let me know.
>>>>>>
>>>>>> 1) Well behaved devices do not allow to set or get an incomplete async
>>>>>> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
>>>>>> 2) Both Laurent and Ricardo consider that there is a big chance that
>>>>>> some camera modules do not implement this properly. (ref: years of
>>>>>> crying over broken module firmware :) )
>>>>>>
>>>>>> 3) ctrl->handle is designed to point to the fh that originated the
>>>>>> control. So the logic can decide if the originator needs to be
>>>>>> notified or not. (ref: uvc_ctrl_send_event() )
>>>>>> 4) Right now we replace the originator in ctrl->handle for unfinished
>>>>>> async controls.  (ref:
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
>>>>>>
>>>>>> My interpretation is that:
>>>>>> A) We need to change 4). We shall not change the originator of
>>>>>> unfinished ctrl->handle.
>>>>>> B) Well behaved cameras do not need the patch "Do not set an async
>>>>>> control owned by another fh"
>>>>>> C) For badly behaved cameras, it is fine if we slightly break the
>>>>>> v4l2-compliance in corner cases, if we do not break any internal data
>>>>>> structure.
>>>>>
>>>>> The fact that some devices may not implement the documented behaviour
>>>>> correctly may not be a problem. Well-behaved devices will stall, which
>>>>> means we shouldn't query the device while as async update is in
>>>>> progress. Badly-behaved devices, whatever they do when queried, should
>>>>> not cause any issue if we don't query them.
>>>>
>>>> I thought we could detect the stall and return safely. Isn't that the case?
>>>
>>> We could, but if we know the device will stall anyway, is there a reason
>>> not to avoid issuing the request in the first place ?
>>>
>>>> Why we have not seen issues with this?
>>>
>>> I haven't tested a PTZ device for a very long time, and you would need
>>> to hit a small time window to see the issue.
>>>
>>>>> We should not send GET_CUR and SET_CUR requests to the device while an
>>>>> async update is in progress, and use cached values instead. When we
>>>>> receive the async update event, we should clear the cache. This will be
>>>>> the same for both well-behaved and badly-behaved devices, so we can
>>>>> expose the same behaviour towards userspace.
>>>>
>>>> seting ctrl->loaded = 0 when we get an event sounds like a good idea
>>>> and something we can implement right away.
>>>> If I have to resend the set I will add it to the end.
>>>>
>>>>> We possibly also need some kind of timeout mechanism to cope with the
>>>>> async update event not being delivered by the device.
>>>>
>>>> This is the part that worries me the most:
>>>> - timeouts make the code fragile
>>>> - What is a good value for timeout? 1 second, 30, 300? I do not think
>>>> that we can find a value.
>>>
>>> I've been thinking about the implementation of uvc_fh cleanup over the
>>> weekend, and having a timeout would have the nice advantage that we
>>> could reference-count uvc_fh instead of implementing a cleanup that
>>> walks over all controls when closing a file handle. I think it would
>>> make the code simpler, and possibly safer too.
>>>
>>>>> Regarding the userspace behaviour during an auto-update, we have
>>>>> multiple options:
>>>>>
>>>>> For control get,
>>>>>
>>>>> - We can return -EBUSY
>>>>> - We can return the old value from the cache
>>
>> This would match the control behavior best. Only when the operation is
>> done is the control updated and the control event sent.
>>
>> Some questions: is any of this documented for UVC? Because this is non-standard
> 
> No this isn't documented.
> 
>> behavior. Are there applications that rely on this? Should we perhaps add
> 
> I don't know.
> 
>> proper support for this to the control framework? E.g. add an ASYNC flag and
>> document this?
> 
> We could, but this is such a specific use case that I don't think is
> worth adding complexity to the already complex control framework would
> be worth it. What we could do is perhaps adding a flag for the userspace
> API, but even there, I never like modelling an API with a single user.

Well, it might be a single driver that uses this, but it is also the most
used driver by far. I think the only change is to add a flag for this and
describe how it should behave. And add v4l2-compliance tests for it.

Otherwise no changes to the control framework are needed, I think.

Controls with the ASYNC flag set would:

- return the old value from the cache.
- document that setting a new value while the operation is in progress
  results in EBUSY. Document that if the new value is equal to the old value,
  then return 0 and do nothing (alternative is to just immediately send
  the control changed event, but that might require a control framework change).
- when the operation finishes, update the cache to the new value and
  send the control changed event.
- document that userspace should specify V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK
  when subscribing to the control if you calling fh wants to know when
  the operation finishes.
- document how timeouts should be handled: this is tricky, especially with
  bad hardware. I.e. if the hw doesn't send the event, does that mean that
  you are never able to set the control since it will stall?
  In the end this will just reflect how UVC handles this.

Regards,

	Hans

> 
>>>>> - We can return the new value fromt he cache
>>>>>
>>>>> Returning -EBUSY would be simpler to implement.
>>>>
>>>> Not only easy, I think it is the most correct,
>>>>
>>>>> I don't think the behaviour should depend on whether the control is read
>>>>> on the file handle that initiated the async operation or on a different
>>>>> file handle.
>>>>>
>>>>> For control set, I don't think we can do much else than returning
>>>>> -EBUSY, regardless of which file handle the control is set on.
>>>>
>>>> ACK.
>>>>
>>>>>> I will send a new version with my interpretation.
>>>>>>
>>>>>> Thanks for a great discussion
>>>>
>>>> Looking with some perspective... I believe that we should look into
>>>> the "userspace behaviour for auto controls" in a different patchset.
>>>> It is slightly unrelated to this discussion.
>
Ricardo Ribalda Dec. 2, 2024, 10:11 a.m. UTC | #24
On Mon, 2 Dec 2024 at 09:38, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> On Mon, Dec 02, 2024 at 08:28:48AM +0100, Ricardo Ribalda wrote:
> > On Mon, 2 Dec 2024 at 01:19, Laurent Pinchart
> > <laurent.pinchart@ideasonboard.com> wrote:
> > >
> > > On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
> > > > On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
> > > > > On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
> > > > > > Before we all go on a well deserved weekend, let me recap what we
> > > > > > know. If I did not get something correctly, let me know.
> > > > > >
> > > > > > 1) Well behaved devices do not allow to set or get an incomplete async
> > > > > > control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
> > > > > > 2) Both Laurent and Ricardo consider that there is a big chance that
> > > > > > some camera modules do not implement this properly. (ref: years of
> > > > > > crying over broken module firmware :) )
> > > > > >
> > > > > > 3) ctrl->handle is designed to point to the fh that originated the
> > > > > > control. So the logic can decide if the originator needs to be
> > > > > > notified or not. (ref: uvc_ctrl_send_event() )
> > > > > > 4) Right now we replace the originator in ctrl->handle for unfinished
> > > > > > async controls.  (ref:
> > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
> > > > > >
> > > > > > My interpretation is that:
> > > > > > A) We need to change 4). We shall not change the originator of
> > > > > > unfinished ctrl->handle.
> > > > > > B) Well behaved cameras do not need the patch "Do not set an async
> > > > > > control owned by another fh"
> > > > > > C) For badly behaved cameras, it is fine if we slightly break the
> > > > > > v4l2-compliance in corner cases, if we do not break any internal data
> > > > > > structure.
> > > > >
> > > > > The fact that some devices may not implement the documented behaviour
> > > > > correctly may not be a problem. Well-behaved devices will stall, which
> > > > > means we shouldn't query the device while as async update is in
> > > > > progress. Badly-behaved devices, whatever they do when queried, should
> > > > > not cause any issue if we don't query them.
> > > >
> > > > I thought we could detect the stall and return safely. Isn't that the case?
> > >
> > > We could, but if we know the device will stall anyway, is there a reason
> > > not to avoid issuing the request in the first place ?
> >
> > Because there are always going to be devices that do not send the
> > event when the control has finished.
> >
> > It is impossible to know the state of the device: Is the zoom still
> > moving or the device is not compliant?
>
> Another type of broken behaviour would be devices that don't correctly
> handle UVC_GET and UVC_CUR requests while an async update is in
> progress. If you issue those requests, those devices will break. Are
> they more or less important than devices that don't send an event ?
>
> All of those are partly theoretical problems. I'd rather keep it simple
> for now until we get feedback about broken devices.
>
> > > > Why we have not seen issues with this?
> > >
> > > I haven't tested a PTZ device for a very long time, and you would need
> > > to hit a small time window to see the issue.
> >
> > Not that small, some devices take up to 10 seconds to go from the
> > smallest zoom to the biggest zoom.
> > I'd say that v4l2-ctl has a very big chance to hit any error.
> >
> > BTW, homing can take an even longer time.
> >
> > > > > We should not send GET_CUR and SET_CUR requests to the device while an
> > > > > async update is in progress, and use cached values instead. When we
> > > > > receive the async update event, we should clear the cache. This will be
> > > > > the same for both well-behaved and badly-behaved devices, so we can
> > > > > expose the same behaviour towards userspace.
> > > >
> > > > seting ctrl->loaded = 0 when we get an event sounds like a good idea
> > > > and something we can implement right away.
> > > > If I have to resend the set I will add it to the end.
> > > >
> > > > > We possibly also need some kind of timeout mechanism to cope with the
> > > > > async update event not being delivered by the device.
> > > >
> > > > This is the part that worries me the most:
> > > > - timeouts make the code fragile
> > > > - What is a good value for timeout? 1 second, 30, 300? I do not think
> > > > that we can find a value.
> > >
> > > I've been thinking about the implementation of uvc_fh cleanup over the
> > > weekend, and having a timeout would have the nice advantage that we
> > > could reference-count uvc_fh instead of implementing a cleanup that
> > > walks over all controls when closing a file handle. I think it would
> > > make the code simpler, and possibly safer too.
> >
> > The problem with a timeout is:
> > - We do not know what is the right value for the timeout.
> > - We need to have one timeout per control, or implement a timeout
> > dispatcher mechanism, and that is racy by nature
> > - It will require introducing a new locking mechanism to avoid races
> > between the timeout handler and the event completion
>
> Timeouts don't come for free, that's for sure.
>
> > - It introduces a new lifetime in the driver... I'd argue that it is
> > best to have less, not more.
>
> That I disagree with, my experience with V4L2 (as well as with DRM, or
> actually the kernel in general) is that we would be in a much better
> place today if object lifetimes were handled with reference counting
> more widely.
>
> > I do not see many benefits....
>
> The main benefit is simplifying the close() implementation and
> complexity, as well as lowering lock contention (whether the latter
> brings a real advantage in practice, I haven't checked)..
>
> > What we can introduce on top of my set is something like this (just
> > pseudocode, do not scream at me :P)  That code can prevent stalls and
> > will work with misbehaving hardware.... (But I still do not know a
> > good value for timeout) and solve some of the issues that I mention.
> >
> > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> > index f0e8a436a306..a55bd4b3ac07 100644
> > --- a/drivers/media/usb/uvc/uvc_ctrl.c
> > +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> > @@ -1132,6 +1132,9 @@ static int __uvc_ctrl_get(struct uvc_video_chain *chain,
> >         if ((ctrl->info.flags & UVC_CTRL_FLAG_GET_CUR) == 0)
> >                 return -EACCES;
> >
> > +       if (ctrl->handle && ( NOW - ctrl->async_start_time ) < TIMEOUT)
> > +               return -EBUSY;
> > +
> >         ret = __uvc_ctrl_load_cur(chain, ctrl);
> >         if (ret < 0)
> >                 return ret;
> > @@ -1591,6 +1594,7 @@ static int uvc_ctrl_get_handle(struct uvc_fh
> > *handle, struct uvc_control *ctrl)
> >                 return -EBUSY;
> >
> >         ctrl->handle = handle;
> > +       ctrl->async_start_time = NOW;
> >         handle->pending_async_ctrls++;
> >         return 0;
> >  }
> > @@ -1982,6 +1986,9 @@ int uvc_ctrl_set(struct uvc_fh *handle,
> >         if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
> >                 return -EACCES;
> >
> > +       if (ctrl->handle && ( NOW - ctrl->async_start_time ) < TIMEOUT)
> > +               return -EBUSY;
>
> If I understand you correctly, this is meant to prevent accessing the
> device for an initial TIMEOUT period, based on the knowledge it will
> very likely STALL. After the TIMEOUT, if the async set is still not
> complete, SET_CUR will be issued, and a STALL will likely occur.
>
> If we device to standardize on -EBUSY while the async set is in
> progress, we need to do so before and after the timeout.
> uvc_query_ctrl() will return -EBUSY in case of a STALL if the device
> reports the "Not ready" error. This is what I expect a device to report
> in this case. Of course the UVC specification, in section 2.4.4,
> documents that the device will update the request error code control,
> but doesn't tell which value should be used :-S I suppose we'll handle
> broken devices if the need arise.
>
> > +
> >         /* Clamp out of range values. */
> >         switch (mapping->v4l2_type) {
> >         case V4L2_CTRL_TYPE_INTEGER:
> > diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
> > index e0e4f099a210..5c82fae94dff 100644
> > --- a/drivers/media/usb/uvc/uvcvideo.h
> > +++ b/drivers/media/usb/uvc/uvcvideo.h
> > @@ -154,6 +154,8 @@ struct uvc_control {
> >                                  * File handle that initially changed the
> >                                  * async control.
> >                                  */
> > +
> > +       TIME async_start_time;
> >  };
> >
> >
> > In any case. Can we solve this in incremental steps? I think the
> > current patch seems simple and correct. We can introduce the timeout
> > later.
>
> The main reason for a timeout was to replace walking over all controls
> at close() time with reference counting of uvc_fh, which is an
> alternative approach to the current patch. Given that picking a good
> timeout value is difficult, and that the current implementation already
> skips the walk in the most common case when no async set is pending, I
> suppose we could start with that.

Cool. I am going to send a new version of:
https://patchwork.linuxtv.org/project/linux-media/patch/20241129-uvc-fix-async-v4-1-f23784dba80f@chromium.org/

using  uvc_ctrl_get_handle / uvc_ctrl_put_handle ad lockdep annotations.

Hopefully in the future we can get rid of: uvc_ctrl_cleanup_fh()


Thanks!
>
> > > > > Regarding the userspace behaviour during an auto-update, we have
> > > > > multiple options:
> > > > >
> > > > > For control get,
> > > > >
> > > > > - We can return -EBUSY
> > > > > - We can return the old value from the cache
> > > > > - We can return the new value fromt he cache
> > > > >
> > > > > Returning -EBUSY would be simpler to implement.
> > > >
> > > > Not only easy, I think it is the most correct,
> > > >
> > > > > I don't think the behaviour should depend on whether the control is read
> > > > > on the file handle that initiated the async operation or on a different
> > > > > file handle.
> > > > >
> > > > > For control set, I don't think we can do much else than returning
> > > > > -EBUSY, regardless of which file handle the control is set on.
> > > >
> > > > ACK.
> > > >
> > > > > > I will send a new version with my interpretation.
> > > > > >
> > > > > > Thanks for a great discussion
> > > >
> > > > Looking with some perspective... I believe that we should look into
> > > > the "userspace behaviour for auto controls" in a different patchset.
> > > > It is slightly unrelated to this discussion.
>
> --
> Regards,
>
> Laurent Pinchart
Hans de Goede Dec. 2, 2024, 10:26 a.m. UTC | #25
Hi,

On 2-Dec-24 9:44 AM, Hans Verkuil wrote:
> On 02/12/2024 09:11, Laurent Pinchart wrote:
>> On Mon, Dec 02, 2024 at 09:05:07AM +0100, Hans Verkuil wrote:
>>> On 02/12/2024 01:18, Laurent Pinchart wrote:
>>>> On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
>>>>> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
>>>>>> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
>>>>>>> Before we all go on a well deserved weekend, let me recap what we
>>>>>>> know. If I did not get something correctly, let me know.
>>>>>>>
>>>>>>> 1) Well behaved devices do not allow to set or get an incomplete async
>>>>>>> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
>>>>>>> 2) Both Laurent and Ricardo consider that there is a big chance that
>>>>>>> some camera modules do not implement this properly. (ref: years of
>>>>>>> crying over broken module firmware :) )
>>>>>>>
>>>>>>> 3) ctrl->handle is designed to point to the fh that originated the
>>>>>>> control. So the logic can decide if the originator needs to be
>>>>>>> notified or not. (ref: uvc_ctrl_send_event() )
>>>>>>> 4) Right now we replace the originator in ctrl->handle for unfinished
>>>>>>> async controls.  (ref:
>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
>>>>>>>
>>>>>>> My interpretation is that:
>>>>>>> A) We need to change 4). We shall not change the originator of
>>>>>>> unfinished ctrl->handle.
>>>>>>> B) Well behaved cameras do not need the patch "Do not set an async
>>>>>>> control owned by another fh"
>>>>>>> C) For badly behaved cameras, it is fine if we slightly break the
>>>>>>> v4l2-compliance in corner cases, if we do not break any internal data
>>>>>>> structure.
>>>>>>
>>>>>> The fact that some devices may not implement the documented behaviour
>>>>>> correctly may not be a problem. Well-behaved devices will stall, which
>>>>>> means we shouldn't query the device while as async update is in
>>>>>> progress. Badly-behaved devices, whatever they do when queried, should
>>>>>> not cause any issue if we don't query them.
>>>>>
>>>>> I thought we could detect the stall and return safely. Isn't that the case?
>>>>
>>>> We could, but if we know the device will stall anyway, is there a reason
>>>> not to avoid issuing the request in the first place ?
>>>>
>>>>> Why we have not seen issues with this?
>>>>
>>>> I haven't tested a PTZ device for a very long time, and you would need
>>>> to hit a small time window to see the issue.
>>>>
>>>>>> We should not send GET_CUR and SET_CUR requests to the device while an
>>>>>> async update is in progress, and use cached values instead. When we
>>>>>> receive the async update event, we should clear the cache. This will be
>>>>>> the same for both well-behaved and badly-behaved devices, so we can
>>>>>> expose the same behaviour towards userspace.
>>>>>
>>>>> seting ctrl->loaded = 0 when we get an event sounds like a good idea
>>>>> and something we can implement right away.
>>>>> If I have to resend the set I will add it to the end.
>>>>>
>>>>>> We possibly also need some kind of timeout mechanism to cope with the
>>>>>> async update event not being delivered by the device.
>>>>>
>>>>> This is the part that worries me the most:
>>>>> - timeouts make the code fragile
>>>>> - What is a good value for timeout? 1 second, 30, 300? I do not think
>>>>> that we can find a value.
>>>>
>>>> I've been thinking about the implementation of uvc_fh cleanup over the
>>>> weekend, and having a timeout would have the nice advantage that we
>>>> could reference-count uvc_fh instead of implementing a cleanup that
>>>> walks over all controls when closing a file handle. I think it would
>>>> make the code simpler, and possibly safer too.
>>>>
>>>>>> Regarding the userspace behaviour during an auto-update, we have
>>>>>> multiple options:
>>>>>>
>>>>>> For control get,
>>>>>>
>>>>>> - We can return -EBUSY
>>>>>> - We can return the old value from the cache
>>>
>>> This would match the control behavior best. Only when the operation is
>>> done is the control updated and the control event sent.
>>>
>>> Some questions: is any of this documented for UVC? Because this is non-standard
>>
>> No this isn't documented.
>>
>>> behavior. Are there applications that rely on this? Should we perhaps add
>>
>> I don't know.
>>
>>> proper support for this to the control framework? E.g. add an ASYNC flag and
>>> document this?
>>
>> We could, but this is such a specific use case that I don't think is
>> worth adding complexity to the already complex control framework would
>> be worth it. What we could do is perhaps adding a flag for the userspace
>> API, but even there, I never like modelling an API with a single user.
> 
> Well, it might be a single driver that uses this, but it is also the most
> used driver by far. I think the only change is to add a flag for this and
> describe how it should behave. And add v4l2-compliance tests for it.
> 
> Otherwise no changes to the control framework are needed, I think.
> 
> Controls with the ASYNC flag set would:
> 
> - return the old value from the cache.
> - document that setting a new value while the operation is in progress
>   results in EBUSY. Document that if the new value is equal to the old value,
>   then return 0 and do nothing (alternative is to just immediately send
>   the control changed event, but that might require a control framework change).
> - when the operation finishes, update the cache to the new value and
>   send the control changed event.
> - document that userspace should specify V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK
>   when subscribing to the control if you calling fh wants to know when
>   the operation finishes.
> - document how timeouts should be handled: this is tricky, especially with
>   bad hardware. I.e. if the hw doesn't send the event, does that mean that
>   you are never able to set the control since it will stall?
>   In the end this will just reflect how UVC handles this.

I have been catching up on this thread (I have not read the v3 and v4
threads yet).

This all started with Ricardo noticing that ctrl->handle may get
overwritten when another app sets the ctrl, causing the first app
to set the ctrl to get a V4L2_EVENT for the ctrl (if subscribed)
even though it set the ctrl itself.

My observations so far:

1. This is only hit when another app changes the ctrl after the first app,
in this case, if there is no stall issued by the hw for the second app's
request, arguably the first app getting the event for the ctrl is correct
since it was changed by the second app. IOW I think the current behavior
is not only fine, but even desirable. Assuming we only override ctrl->handle
after successfully sending the set-ctrl request to the hardware.

2. This adds a lot of complexity for not sending an event to the app
which made the change. Hans V. suggested maybe adding some sort of flag
for async ctrls to the userspace API. I wonder if we should not just
get rid of this complexity and document that these controls will always
generate events independent of V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK ?
That would certainly simplify things, but it raises the questions if
this will cause issues for existing applications.

Note that if we simply return -EBUSY on set until acked by a status
event we also avoid the issue of ctrl->handle getting overwritten,
but that relies on reliable status events; or requires timeout handling.

3. I agree with Ricardo that a timeout based approach for cameras which
to not properly send status events for async ctrls is going to be
problematic. Things like pan/tilt homing can take multiple seconds which
is really long to use as a timeout if we plan to return -EBUSY until
the timeout triggers. I think it would be better to just rely on
the hardware sending a stall, or it accepting and correctly handling
a new CUR_SET command while the previous one is still being processed.

I guess we can track if the hw does send status events when async ctrls
complete and then do the -EBUSY thing without going out to the hw after
the first time an async ctrl has been acked by a status event.

And then combine that with the current behavior of overwriting ctrl->handle
until the ctrl has been marked as having working status events. So:

a) In case we do not know yet if a ctrl gets status-event acks; and
on devices without reliable status events keep current behavior.

b) As soon as we know a ctrl has reliable status events, switch to
returning -EBUSY if a set is pending (as indicated by ctrl->handle
being set).

I don't like the fact that this changes the behavior after the first
status event acking an async ctrl, but I don't really see another way.

Regards,

Hans




>>
>>>>>> - We can return the new value fromt he cache
>>>>>>
>>>>>> Returning -EBUSY would be simpler to implement.
>>>>>
>>>>> Not only easy, I think it is the most correct,
>>>>>
>>>>>> I don't think the behaviour should depend on whether the control is read
>>>>>> on the file handle that initiated the async operation or on a different
>>>>>> file handle.
>>>>>>
>>>>>> For control set, I don't think we can do much else than returning
>>>>>> -EBUSY, regardless of which file handle the control is set on.
>>>>>
>>>>> ACK.
>>>>>
>>>>>>> I will send a new version with my interpretation.
>>>>>>>
>>>>>>> Thanks for a great discussion
>>>>>
>>>>> Looking with some perspective... I believe that we should look into
>>>>> the "userspace behaviour for auto controls" in a different patchset.
>>>>> It is slightly unrelated to this discussion.
>>
> 
>
Ricardo Ribalda Dec. 2, 2024, 10:50 a.m. UTC | #26
On Mon, 2 Dec 2024 at 11:27, Hans de Goede <hdegoede@redhat.com> wrote:
>
> Hi,
>
> On 2-Dec-24 9:44 AM, Hans Verkuil wrote:
> > On 02/12/2024 09:11, Laurent Pinchart wrote:
> >> On Mon, Dec 02, 2024 at 09:05:07AM +0100, Hans Verkuil wrote:
> >>> On 02/12/2024 01:18, Laurent Pinchart wrote:
> >>>> On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
> >>>>> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
> >>>>>> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
> >>>>>>> Before we all go on a well deserved weekend, let me recap what we
> >>>>>>> know. If I did not get something correctly, let me know.
> >>>>>>>
> >>>>>>> 1) Well behaved devices do not allow to set or get an incomplete async
> >>>>>>> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
> >>>>>>> 2) Both Laurent and Ricardo consider that there is a big chance that
> >>>>>>> some camera modules do not implement this properly. (ref: years of
> >>>>>>> crying over broken module firmware :) )
> >>>>>>>
> >>>>>>> 3) ctrl->handle is designed to point to the fh that originated the
> >>>>>>> control. So the logic can decide if the originator needs to be
> >>>>>>> notified or not. (ref: uvc_ctrl_send_event() )
> >>>>>>> 4) Right now we replace the originator in ctrl->handle for unfinished
> >>>>>>> async controls.  (ref:
> >>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
> >>>>>>>
> >>>>>>> My interpretation is that:
> >>>>>>> A) We need to change 4). We shall not change the originator of
> >>>>>>> unfinished ctrl->handle.
> >>>>>>> B) Well behaved cameras do not need the patch "Do not set an async
> >>>>>>> control owned by another fh"
> >>>>>>> C) For badly behaved cameras, it is fine if we slightly break the
> >>>>>>> v4l2-compliance in corner cases, if we do not break any internal data
> >>>>>>> structure.
> >>>>>>
> >>>>>> The fact that some devices may not implement the documented behaviour
> >>>>>> correctly may not be a problem. Well-behaved devices will stall, which
> >>>>>> means we shouldn't query the device while as async update is in
> >>>>>> progress. Badly-behaved devices, whatever they do when queried, should
> >>>>>> not cause any issue if we don't query them.
> >>>>>
> >>>>> I thought we could detect the stall and return safely. Isn't that the case?
> >>>>
> >>>> We could, but if we know the device will stall anyway, is there a reason
> >>>> not to avoid issuing the request in the first place ?
> >>>>
> >>>>> Why we have not seen issues with this?
> >>>>
> >>>> I haven't tested a PTZ device for a very long time, and you would need
> >>>> to hit a small time window to see the issue.
> >>>>
> >>>>>> We should not send GET_CUR and SET_CUR requests to the device while an
> >>>>>> async update is in progress, and use cached values instead. When we
> >>>>>> receive the async update event, we should clear the cache. This will be
> >>>>>> the same for both well-behaved and badly-behaved devices, so we can
> >>>>>> expose the same behaviour towards userspace.
> >>>>>
> >>>>> seting ctrl->loaded = 0 when we get an event sounds like a good idea
> >>>>> and something we can implement right away.
> >>>>> If I have to resend the set I will add it to the end.
> >>>>>
> >>>>>> We possibly also need some kind of timeout mechanism to cope with the
> >>>>>> async update event not being delivered by the device.
> >>>>>
> >>>>> This is the part that worries me the most:
> >>>>> - timeouts make the code fragile
> >>>>> - What is a good value for timeout? 1 second, 30, 300? I do not think
> >>>>> that we can find a value.
> >>>>
> >>>> I've been thinking about the implementation of uvc_fh cleanup over the
> >>>> weekend, and having a timeout would have the nice advantage that we
> >>>> could reference-count uvc_fh instead of implementing a cleanup that
> >>>> walks over all controls when closing a file handle. I think it would
> >>>> make the code simpler, and possibly safer too.
> >>>>
> >>>>>> Regarding the userspace behaviour during an auto-update, we have
> >>>>>> multiple options:
> >>>>>>
> >>>>>> For control get,
> >>>>>>
> >>>>>> - We can return -EBUSY
> >>>>>> - We can return the old value from the cache
> >>>
> >>> This would match the control behavior best. Only when the operation is
> >>> done is the control updated and the control event sent.
> >>>
> >>> Some questions: is any of this documented for UVC? Because this is non-standard
> >>
> >> No this isn't documented.
> >>
> >>> behavior. Are there applications that rely on this? Should we perhaps add
> >>
> >> I don't know.
> >>
> >>> proper support for this to the control framework? E.g. add an ASYNC flag and
> >>> document this?
> >>
> >> We could, but this is such a specific use case that I don't think is
> >> worth adding complexity to the already complex control framework would
> >> be worth it. What we could do is perhaps adding a flag for the userspace
> >> API, but even there, I never like modelling an API with a single user.
> >
> > Well, it might be a single driver that uses this, but it is also the most
> > used driver by far. I think the only change is to add a flag for this and
> > describe how it should behave. And add v4l2-compliance tests for it.
> >
> > Otherwise no changes to the control framework are needed, I think.
> >
> > Controls with the ASYNC flag set would:
> >
> > - return the old value from the cache.
> > - document that setting a new value while the operation is in progress
> >   results in EBUSY. Document that if the new value is equal to the old value,
> >   then return 0 and do nothing (alternative is to just immediately send
> >   the control changed event, but that might require a control framework change).
> > - when the operation finishes, update the cache to the new value and
> >   send the control changed event.
> > - document that userspace should specify V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK
> >   when subscribing to the control if you calling fh wants to know when
> >   the operation finishes.
> > - document how timeouts should be handled: this is tricky, especially with
> >   bad hardware. I.e. if the hw doesn't send the event, does that mean that
> >   you are never able to set the control since it will stall?
> >   In the end this will just reflect how UVC handles this.
>
> I have been catching up on this thread (I have not read the v3 and v4
> threads yet).
>
> This all started with Ricardo noticing that ctrl->handle may get
> overwritten when another app sets the ctrl, causing the first app
> to set the ctrl to get a V4L2_EVENT for the ctrl (if subscribed)
> even though it set the ctrl itself.
>
> My observations so far:
>
> 1. This is only hit when another app changes the ctrl after the first app,
> in this case, if there is no stall issued by the hw for the second app's
> request, arguably the first app getting the event for the ctrl is correct

In other words, for non compliant cameras the current behaviour is
correct. For compliant cameras it is broken.

> since it was changed by the second app. IOW I think the current behavior
> is not only fine, but even desirable. Assuming we only override ctrl->handle
> after successfully sending the set-ctrl request to the hardware.

We are overriding ctrl->handle unconditionally, even if set-ctrl stalls.


>
> 2. This adds a lot of complexity for not sending an event to the app
> which made the change. Hans V. suggested maybe adding some sort of flag
> for async ctrls to the userspace API. I wonder if we should not just
> get rid of this complexity and document that these controls will always
> generate events independent of V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK ?
> That would certainly simplify things, but it raises the questions if
> this will cause issues for existing applications.

To be honest, I am more concerned about the dangling pointers than the event.

Updating the doc to say that  ASYC controls always generate events
sounds good to me. But until we reach an agreement on the specifics
I'd rather land this fix and then we can take time to design an API
that works for compliant and non compliant hardware.

>
> Note that if we simply return -EBUSY on set until acked by a status
> event we also avoid the issue of ctrl->handle getting overwritten,
> but that relies on reliable status events; or requires timeout handling.
>
> 3. I agree with Ricardo that a timeout based approach for cameras which
> to not properly send status events for async ctrls is going to be
> problematic. Things like pan/tilt homing can take multiple seconds which
> is really long to use as a timeout if we plan to return -EBUSY until
> the timeout triggers. I think it would be better to just rely on
> the hardware sending a stall, or it accepting and correctly handling
> a new CUR_SET command while the previous one is still being processed.
>
> I guess we can track if the hw does send status events when async ctrls
> complete and then do the -EBUSY thing without going out to the hw after
> the first time an async ctrl has been acked by a status event.
>
> And then combine that with the current behavior of overwriting ctrl->handle
> until the ctrl has been marked as having working status events. So:
>
> a) In case we do not know yet if a ctrl gets status-event acks; and
> on devices without reliable status events keep current behavior.
>
> b) As soon as we know a ctrl has reliable status events, switch to
> returning -EBUSY if a set is pending (as indicated by ctrl->handle
> being set).
>
> I don't like the fact that this changes the behavior after the first
> status event acking an async ctrl, but I don't really see another way.

If I understood you correctly, you are proposing the following quirk:

diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
index f0e8a436a306..1a554afeaa2f 100644
--- a/drivers/media/usb/uvc/uvc_ctrl.c
+++ b/drivers/media/usb/uvc/uvc_ctrl.c
@@ -1132,6 +1132,9 @@ static int __uvc_ctrl_get(struct uvc_video_chain *chain,
        if ((ctrl->info.flags & UVC_CTRL_FLAG_GET_CUR) == 0)
                return -EACCES;

+       if (ctrl->handle && ctrl->async_event_works)
+               return -EBUSY;
+
        ret = __uvc_ctrl_load_cur(chain, ctrl);
        if (ret < 0)
                return ret;
@@ -1672,6 +1675,8 @@ bool uvc_ctrl_status_event_async(struct urb
*urb, struct uvc_video_chain *chain,
        /* Flush the control cache, the data might have changed. */
        ctrl->loaded = 0;

+       ctrl->async_event_works = true;
+
        if (list_empty(&ctrl->info.mappings))
                return false;

@@ -1982,6 +1987,9 @@ int uvc_ctrl_set(struct uvc_fh *handle,
        if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
                return -EACCES;

+       if (ctrl->handle && ctrl->async_event_works)
+               return -EBUSY;
+
        /* Clamp out of range values. */
        switch (mapping->v4l2_type) {
        case V4L2_CTRL_TYPE_INTEGER:
diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
index e0e4f099a210..0ef7c594eecb 100644
--- a/drivers/media/usb/uvc/uvcvideo.h
+++ b/drivers/media/usb/uvc/uvcvideo.h
@@ -154,6 +154,7 @@ struct uvc_control {
                                 * File handle that initially changed the
                                 * async control.
                                 */
+       bool async_event_works;
 };

The benefit is that we can predict a device returning STALL without
having to actually do the set/get operation.

We can add it as a follow-up patch.


>
> Regards,
>
> Hans
>
>
>
>
> >>
> >>>>>> - We can return the new value fromt he cache
> >>>>>>
> >>>>>> Returning -EBUSY would be simpler to implement.
> >>>>>
> >>>>> Not only easy, I think it is the most correct,
> >>>>>
> >>>>>> I don't think the behaviour should depend on whether the control is read
> >>>>>> on the file handle that initiated the async operation or on a different
> >>>>>> file handle.
> >>>>>>
> >>>>>> For control set, I don't think we can do much else than returning
> >>>>>> -EBUSY, regardless of which file handle the control is set on.
> >>>>>
> >>>>> ACK.
> >>>>>
> >>>>>>> I will send a new version with my interpretation.
> >>>>>>>
> >>>>>>> Thanks for a great discussion
> >>>>>
> >>>>> Looking with some perspective... I believe that we should look into
> >>>>> the "userspace behaviour for auto controls" in a different patchset.
> >>>>> It is slightly unrelated to this discussion.
> >>
> >
> >
>
Hans Verkuil Dec. 2, 2024, 10:55 a.m. UTC | #27
On 02/12/2024 11:26, Hans de Goede wrote:
> Hi,
> 
> On 2-Dec-24 9:44 AM, Hans Verkuil wrote:
>> On 02/12/2024 09:11, Laurent Pinchart wrote:
>>> On Mon, Dec 02, 2024 at 09:05:07AM +0100, Hans Verkuil wrote:
>>>> On 02/12/2024 01:18, Laurent Pinchart wrote:
>>>>> On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
>>>>>> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
>>>>>>> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
>>>>>>>> Before we all go on a well deserved weekend, let me recap what we
>>>>>>>> know. If I did not get something correctly, let me know.
>>>>>>>>
>>>>>>>> 1) Well behaved devices do not allow to set or get an incomplete async
>>>>>>>> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
>>>>>>>> 2) Both Laurent and Ricardo consider that there is a big chance that
>>>>>>>> some camera modules do not implement this properly. (ref: years of
>>>>>>>> crying over broken module firmware :) )
>>>>>>>>
>>>>>>>> 3) ctrl->handle is designed to point to the fh that originated the
>>>>>>>> control. So the logic can decide if the originator needs to be
>>>>>>>> notified or not. (ref: uvc_ctrl_send_event() )
>>>>>>>> 4) Right now we replace the originator in ctrl->handle for unfinished
>>>>>>>> async controls.  (ref:
>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
>>>>>>>>
>>>>>>>> My interpretation is that:
>>>>>>>> A) We need to change 4). We shall not change the originator of
>>>>>>>> unfinished ctrl->handle.
>>>>>>>> B) Well behaved cameras do not need the patch "Do not set an async
>>>>>>>> control owned by another fh"
>>>>>>>> C) For badly behaved cameras, it is fine if we slightly break the
>>>>>>>> v4l2-compliance in corner cases, if we do not break any internal data
>>>>>>>> structure.
>>>>>>>
>>>>>>> The fact that some devices may not implement the documented behaviour
>>>>>>> correctly may not be a problem. Well-behaved devices will stall, which
>>>>>>> means we shouldn't query the device while as async update is in
>>>>>>> progress. Badly-behaved devices, whatever they do when queried, should
>>>>>>> not cause any issue if we don't query them.
>>>>>>
>>>>>> I thought we could detect the stall and return safely. Isn't that the case?
>>>>>
>>>>> We could, but if we know the device will stall anyway, is there a reason
>>>>> not to avoid issuing the request in the first place ?
>>>>>
>>>>>> Why we have not seen issues with this?
>>>>>
>>>>> I haven't tested a PTZ device for a very long time, and you would need
>>>>> to hit a small time window to see the issue.
>>>>>
>>>>>>> We should not send GET_CUR and SET_CUR requests to the device while an
>>>>>>> async update is in progress, and use cached values instead. When we
>>>>>>> receive the async update event, we should clear the cache. This will be
>>>>>>> the same for both well-behaved and badly-behaved devices, so we can
>>>>>>> expose the same behaviour towards userspace.
>>>>>>
>>>>>> seting ctrl->loaded = 0 when we get an event sounds like a good idea
>>>>>> and something we can implement right away.
>>>>>> If I have to resend the set I will add it to the end.
>>>>>>
>>>>>>> We possibly also need some kind of timeout mechanism to cope with the
>>>>>>> async update event not being delivered by the device.
>>>>>>
>>>>>> This is the part that worries me the most:
>>>>>> - timeouts make the code fragile
>>>>>> - What is a good value for timeout? 1 second, 30, 300? I do not think
>>>>>> that we can find a value.
>>>>>
>>>>> I've been thinking about the implementation of uvc_fh cleanup over the
>>>>> weekend, and having a timeout would have the nice advantage that we
>>>>> could reference-count uvc_fh instead of implementing a cleanup that
>>>>> walks over all controls when closing a file handle. I think it would
>>>>> make the code simpler, and possibly safer too.
>>>>>
>>>>>>> Regarding the userspace behaviour during an auto-update, we have
>>>>>>> multiple options:
>>>>>>>
>>>>>>> For control get,
>>>>>>>
>>>>>>> - We can return -EBUSY
>>>>>>> - We can return the old value from the cache
>>>>
>>>> This would match the control behavior best. Only when the operation is
>>>> done is the control updated and the control event sent.
>>>>
>>>> Some questions: is any of this documented for UVC? Because this is non-standard
>>>
>>> No this isn't documented.
>>>
>>>> behavior. Are there applications that rely on this? Should we perhaps add
>>>
>>> I don't know.
>>>
>>>> proper support for this to the control framework? E.g. add an ASYNC flag and
>>>> document this?
>>>
>>> We could, but this is such a specific use case that I don't think is
>>> worth adding complexity to the already complex control framework would
>>> be worth it. What we could do is perhaps adding a flag for the userspace
>>> API, but even there, I never like modelling an API with a single user.
>>
>> Well, it might be a single driver that uses this, but it is also the most
>> used driver by far. I think the only change is to add a flag for this and
>> describe how it should behave. And add v4l2-compliance tests for it.
>>
>> Otherwise no changes to the control framework are needed, I think.
>>
>> Controls with the ASYNC flag set would:
>>
>> - return the old value from the cache.
>> - document that setting a new value while the operation is in progress
>>   results in EBUSY. Document that if the new value is equal to the old value,
>>   then return 0 and do nothing (alternative is to just immediately send
>>   the control changed event, but that might require a control framework change).
>> - when the operation finishes, update the cache to the new value and
>>   send the control changed event.
>> - document that userspace should specify V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK
>>   when subscribing to the control if you calling fh wants to know when
>>   the operation finishes.
>> - document how timeouts should be handled: this is tricky, especially with
>>   bad hardware. I.e. if the hw doesn't send the event, does that mean that
>>   you are never able to set the control since it will stall?
>>   In the end this will just reflect how UVC handles this.
> 
> I have been catching up on this thread (I have not read the v3 and v4
> threads yet).
> 
> This all started with Ricardo noticing that ctrl->handle may get
> overwritten when another app sets the ctrl, causing the first app
> to set the ctrl to get a V4L2_EVENT for the ctrl (if subscribed)
> even though it set the ctrl itself.
> 
> My observations so far:
> 
> 1. This is only hit when another app changes the ctrl after the first app,
> in this case, if there is no stall issued by the hw for the second app's
> request, arguably the first app getting the event for the ctrl is correct
> since it was changed by the second app. IOW I think the current behavior
> is not only fine, but even desirable. Assuming we only override ctrl->handle
> after successfully sending the set-ctrl request to the hardware.
> 
> 2. This adds a lot of complexity for not sending an event to the app
> which made the change. Hans V. suggested maybe adding some sort of flag
> for async ctrls to the userspace API. I wonder if we should not just
> get rid of this complexity and document that these controls will always
> generate events independent of V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK ?
> That would certainly simplify things, but it raises the questions if
> this will cause issues for existing applications.

I'm not that keen on this. That's why a new flag can come in handy since
if present, then that indicates that it makes sense to specify
V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK when subscribing to the control events.

This ensures that uvc follows the current v4l2 spec. It's also why I
prefer that g_ctrl will just return the old value until the new value
has been reached: that way the control event corresponds with the actual
updating of the control value.

That said, it's just my opinion and I am OK with UVC doing things a bit
differently. Just be aware that I have no objection to adding an ASYNC flag,
given how widely UVC is used.

Regards,

	Hans

> 
> Note that if we simply return -EBUSY on set until acked by a status
> event we also avoid the issue of ctrl->handle getting overwritten,
> but that relies on reliable status events; or requires timeout handling.
> 
> 3. I agree with Ricardo that a timeout based approach for cameras which
> to not properly send status events for async ctrls is going to be
> problematic. Things like pan/tilt homing can take multiple seconds which
> is really long to use as a timeout if we plan to return -EBUSY until
> the timeout triggers. I think it would be better to just rely on
> the hardware sending a stall, or it accepting and correctly handling
> a new CUR_SET command while the previous one is still being processed.
> 
> I guess we can track if the hw does send status events when async ctrls
> complete and then do the -EBUSY thing without going out to the hw after
> the first time an async ctrl has been acked by a status event.
> 
> And then combine that with the current behavior of overwriting ctrl->handle
> until the ctrl has been marked as having working status events. So:
> 
> a) In case we do not know yet if a ctrl gets status-event acks; and
> on devices without reliable status events keep current behavior.
> 
> b) As soon as we know a ctrl has reliable status events, switch to
> returning -EBUSY if a set is pending (as indicated by ctrl->handle
> being set).
> 
> I don't like the fact that this changes the behavior after the first
> status event acking an async ctrl, but I don't really see another way.
> 
> Regards,
> 
> Hans
> 
> 
> 
> 
>>>
>>>>>>> - We can return the new value fromt he cache
>>>>>>>
>>>>>>> Returning -EBUSY would be simpler to implement.
>>>>>>
>>>>>> Not only easy, I think it is the most correct,
>>>>>>
>>>>>>> I don't think the behaviour should depend on whether the control is read
>>>>>>> on the file handle that initiated the async operation or on a different
>>>>>>> file handle.
>>>>>>>
>>>>>>> For control set, I don't think we can do much else than returning
>>>>>>> -EBUSY, regardless of which file handle the control is set on.
>>>>>>
>>>>>> ACK.
>>>>>>
>>>>>>>> I will send a new version with my interpretation.
>>>>>>>>
>>>>>>>> Thanks for a great discussion
>>>>>>
>>>>>> Looking with some perspective... I believe that we should look into
>>>>>> the "userspace behaviour for auto controls" in a different patchset.
>>>>>> It is slightly unrelated to this discussion.
>>>
>>
>>
> 
>
Hans de Goede Dec. 2, 2024, 12:19 p.m. UTC | #28
Hi,

On 2-Dec-24 11:50 AM, Ricardo Ribalda wrote:
> On Mon, 2 Dec 2024 at 11:27, Hans de Goede <hdegoede@redhat.com> wrote:
>>
>> Hi,
>>
>> On 2-Dec-24 9:44 AM, Hans Verkuil wrote:
>>> On 02/12/2024 09:11, Laurent Pinchart wrote:
>>>> On Mon, Dec 02, 2024 at 09:05:07AM +0100, Hans Verkuil wrote:
>>>>> On 02/12/2024 01:18, Laurent Pinchart wrote:
>>>>>> On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
>>>>>>> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
>>>>>>>> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
>>>>>>>>> Before we all go on a well deserved weekend, let me recap what we
>>>>>>>>> know. If I did not get something correctly, let me know.
>>>>>>>>>
>>>>>>>>> 1) Well behaved devices do not allow to set or get an incomplete async
>>>>>>>>> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
>>>>>>>>> 2) Both Laurent and Ricardo consider that there is a big chance that
>>>>>>>>> some camera modules do not implement this properly. (ref: years of
>>>>>>>>> crying over broken module firmware :) )
>>>>>>>>>
>>>>>>>>> 3) ctrl->handle is designed to point to the fh that originated the
>>>>>>>>> control. So the logic can decide if the originator needs to be
>>>>>>>>> notified or not. (ref: uvc_ctrl_send_event() )
>>>>>>>>> 4) Right now we replace the originator in ctrl->handle for unfinished
>>>>>>>>> async controls.  (ref:
>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
>>>>>>>>>
>>>>>>>>> My interpretation is that:
>>>>>>>>> A) We need to change 4). We shall not change the originator of
>>>>>>>>> unfinished ctrl->handle.
>>>>>>>>> B) Well behaved cameras do not need the patch "Do not set an async
>>>>>>>>> control owned by another fh"
>>>>>>>>> C) For badly behaved cameras, it is fine if we slightly break the
>>>>>>>>> v4l2-compliance in corner cases, if we do not break any internal data
>>>>>>>>> structure.
>>>>>>>>
>>>>>>>> The fact that some devices may not implement the documented behaviour
>>>>>>>> correctly may not be a problem. Well-behaved devices will stall, which
>>>>>>>> means we shouldn't query the device while as async update is in
>>>>>>>> progress. Badly-behaved devices, whatever they do when queried, should
>>>>>>>> not cause any issue if we don't query them.
>>>>>>>
>>>>>>> I thought we could detect the stall and return safely. Isn't that the case?
>>>>>>
>>>>>> We could, but if we know the device will stall anyway, is there a reason
>>>>>> not to avoid issuing the request in the first place ?
>>>>>>
>>>>>>> Why we have not seen issues with this?
>>>>>>
>>>>>> I haven't tested a PTZ device for a very long time, and you would need
>>>>>> to hit a small time window to see the issue.
>>>>>>
>>>>>>>> We should not send GET_CUR and SET_CUR requests to the device while an
>>>>>>>> async update is in progress, and use cached values instead. When we
>>>>>>>> receive the async update event, we should clear the cache. This will be
>>>>>>>> the same for both well-behaved and badly-behaved devices, so we can
>>>>>>>> expose the same behaviour towards userspace.
>>>>>>>
>>>>>>> seting ctrl->loaded = 0 when we get an event sounds like a good idea
>>>>>>> and something we can implement right away.
>>>>>>> If I have to resend the set I will add it to the end.
>>>>>>>
>>>>>>>> We possibly also need some kind of timeout mechanism to cope with the
>>>>>>>> async update event not being delivered by the device.
>>>>>>>
>>>>>>> This is the part that worries me the most:
>>>>>>> - timeouts make the code fragile
>>>>>>> - What is a good value for timeout? 1 second, 30, 300? I do not think
>>>>>>> that we can find a value.
>>>>>>
>>>>>> I've been thinking about the implementation of uvc_fh cleanup over the
>>>>>> weekend, and having a timeout would have the nice advantage that we
>>>>>> could reference-count uvc_fh instead of implementing a cleanup that
>>>>>> walks over all controls when closing a file handle. I think it would
>>>>>> make the code simpler, and possibly safer too.
>>>>>>
>>>>>>>> Regarding the userspace behaviour during an auto-update, we have
>>>>>>>> multiple options:
>>>>>>>>
>>>>>>>> For control get,
>>>>>>>>
>>>>>>>> - We can return -EBUSY
>>>>>>>> - We can return the old value from the cache
>>>>>
>>>>> This would match the control behavior best. Only when the operation is
>>>>> done is the control updated and the control event sent.
>>>>>
>>>>> Some questions: is any of this documented for UVC? Because this is non-standard
>>>>
>>>> No this isn't documented.
>>>>
>>>>> behavior. Are there applications that rely on this? Should we perhaps add
>>>>
>>>> I don't know.
>>>>
>>>>> proper support for this to the control framework? E.g. add an ASYNC flag and
>>>>> document this?
>>>>
>>>> We could, but this is such a specific use case that I don't think is
>>>> worth adding complexity to the already complex control framework would
>>>> be worth it. What we could do is perhaps adding a flag for the userspace
>>>> API, but even there, I never like modelling an API with a single user.
>>>
>>> Well, it might be a single driver that uses this, but it is also the most
>>> used driver by far. I think the only change is to add a flag for this and
>>> describe how it should behave. And add v4l2-compliance tests for it.
>>>
>>> Otherwise no changes to the control framework are needed, I think.
>>>
>>> Controls with the ASYNC flag set would:
>>>
>>> - return the old value from the cache.
>>> - document that setting a new value while the operation is in progress
>>>   results in EBUSY. Document that if the new value is equal to the old value,
>>>   then return 0 and do nothing (alternative is to just immediately send
>>>   the control changed event, but that might require a control framework change).
>>> - when the operation finishes, update the cache to the new value and
>>>   send the control changed event.
>>> - document that userspace should specify V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK
>>>   when subscribing to the control if you calling fh wants to know when
>>>   the operation finishes.
>>> - document how timeouts should be handled: this is tricky, especially with
>>>   bad hardware. I.e. if the hw doesn't send the event, does that mean that
>>>   you are never able to set the control since it will stall?
>>>   In the end this will just reflect how UVC handles this.
>>
>> I have been catching up on this thread (I have not read the v3 and v4
>> threads yet).
>>
>> This all started with Ricardo noticing that ctrl->handle may get
>> overwritten when another app sets the ctrl, causing the first app
>> to set the ctrl to get a V4L2_EVENT for the ctrl (if subscribed)
>> even though it set the ctrl itself.
>>
>> My observations so far:
>>
>> 1. This is only hit when another app changes the ctrl after the first app,
>> in this case, if there is no stall issued by the hw for the second app's
>> request, arguably the first app getting the event for the ctrl is correct
> 
> In other words, for non compliant cameras the current behaviour is
> correct. For compliant cameras it is broken.
> 
>> since it was changed by the second app. IOW I think the current behavior
>> is not only fine, but even desirable. Assuming we only override ctrl->handle
>> after successfully sending the set-ctrl request to the hardware.
> 
> We are overriding ctrl->handle unconditionally, even if set-ctrl stalls.

Right I was just looking at that. Since we hold chain->ctrl_mutex
from the start of uvc_ioctl_s_try_ext_ctrls() until we finish
uvc_ctrl_commit() we can delay setting ctrl->handle until
a successful commit.

Actually looking at this I think I've found another bug, uvc_ctrl_set()
always sets ctrl->handle also in the VIDIOC_TRY_EXT_CTRLS case in
which case we never send the SET_CUR command to the camera.

And the handle is not part of the data backed up / restored by
a rollback.

Moving the storing of the handle to a successful commit fixes
both the overriding of the handle in the stall case as well
as the bogus setting of the handle on VIDIOC_TRY_EXT_CTRLS.

So my suggestion would be not touching ctrl->handle from
ctrl_set() instead store the handle in a new ctrl->new_handle
variable there and copy the new_handle values to handle for affected
controls after a successful commit (before releasing the lock).

This will also successfully replace the handle for buggy
devices which do not report a stall but instead accept the
second SET_CUR. This replacement of handle is the correct thing
todo in this case since after the second SET_CUR is accepted
there is a change of the ctrl happening from the pov of the
issuer of the first SET_CUR.

>> 2. This adds a lot of complexity for not sending an event to the app
>> which made the change. Hans V. suggested maybe adding some sort of flag
>> for async ctrls to the userspace API. I wonder if we should not just
>> get rid of this complexity and document that these controls will always
>> generate events independent of V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK ?
>> That would certainly simplify things, but it raises the questions if
>> this will cause issues for existing applications.
> 
> To be honest, I am more concerned about the dangling pointers than the event.

Ack.

> Updating the doc to say that  ASYC controls always generate events
> sounds good to me. But until we reach an agreement on the specifics
> I'd rather land this fix and then we can take time to design an API
> that works for compliant and non compliant hardware.

I agree that we should focus on fixing the dangling pointer problem
and your v4 series is heading in the right direction there.

I'm not sure if we should take your v4 series as is though, see above.

At a minimum I think the issue with setting ctrl->handle in
the VIDIOC_TRY_EXT_CTRLS case needs to be fixed.

>> Note that if we simply return -EBUSY on set until acked by a status
>> event we also avoid the issue of ctrl->handle getting overwritten,
>> but that relies on reliable status events; or requires timeout handling.
>>
>> 3. I agree with Ricardo that a timeout based approach for cameras which
>> to not properly send status events for async ctrls is going to be
>> problematic. Things like pan/tilt homing can take multiple seconds which
>> is really long to use as a timeout if we plan to return -EBUSY until
>> the timeout triggers. I think it would be better to just rely on
>> the hardware sending a stall, or it accepting and correctly handling
>> a new CUR_SET command while the previous one is still being processed.
>>
>> I guess we can track if the hw does send status events when async ctrls
>> complete and then do the -EBUSY thing without going out to the hw after
>> the first time an async ctrl has been acked by a status event.
>>
>> And then combine that with the current behavior of overwriting ctrl->handle
>> until the ctrl has been marked as having working status events. So:
>>
>> a) In case we do not know yet if a ctrl gets status-event acks; and
>> on devices without reliable status events keep current behavior.
>>
>> b) As soon as we know a ctrl has reliable status events, switch to
>> returning -EBUSY if a set is pending (as indicated by ctrl->handle
>> being set).
>>
>> I don't like the fact that this changes the behavior after the first
>> status event acking an async ctrl, but I don't really see another way.
> 
> If I understood you correctly, you are proposing the following quirk:
> 
> diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> index f0e8a436a306..1a554afeaa2f 100644
> --- a/drivers/media/usb/uvc/uvc_ctrl.c
> +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> @@ -1132,6 +1132,9 @@ static int __uvc_ctrl_get(struct uvc_video_chain *chain,
>         if ((ctrl->info.flags & UVC_CTRL_FLAG_GET_CUR) == 0)
>                 return -EACCES;
> 
> +       if (ctrl->handle && ctrl->async_event_works)
> +               return -EBUSY;
> +
>         ret = __uvc_ctrl_load_cur(chain, ctrl);
>         if (ret < 0)
>                 return ret;
> @@ -1672,6 +1675,8 @@ bool uvc_ctrl_status_event_async(struct urb> *urb, struct uvc_video_chain *chain,
>         /* Flush the control cache, the data might have changed. */
>         ctrl->loaded = 0;
> 
> +       ctrl->async_event_works = true;
> +
>         if (list_empty(&ctrl->info.mappings))
>                 return false;
> 
> @@ -1982,6 +1987,9 @@ int uvc_ctrl_set(struct uvc_fh *handle,
>         if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
>                 return -EACCES;
> 
> +       if (ctrl->handle && ctrl->async_event_works)
> +               return -EBUSY;
> +

Yes this is what I'm proposing, except that this check should be
skipped in the VIDIOC_TRY_EXT_CTRLS case. I think we need to add
a "bool try" parameter to uvc_ctrl_set() and not look at / set
[new_]handle when this is set.



>         /* Clamp out of range values. */
>         switch (mapping->v4l2_type) {
>         case V4L2_CTRL_TYPE_INTEGER:
> diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
> index e0e4f099a210..0ef7c594eecb 100644
> --- a/drivers/media/usb/uvc/uvcvideo.h
> +++ b/drivers/media/usb/uvc/uvcvideo.h
> @@ -154,6 +154,7 @@ struct uvc_control {
>                                  * File handle that initially changed the
>                                  * async control.
>                                  */
> +       bool async_event_works;
>  };
> 
> The benefit is that we can predict a device returning STALL without
> having to actually do the set/get operation.
> 
> We can add it as a follow-up patch.

Another benefit would be correctly returning -EBUSY when trying to get
the ctrl. I agree this could be done as a follow-up.

Regards,

Hans
Ricardo Ribalda Dec. 2, 2024, 1:29 p.m. UTC | #29
On Mon, 2 Dec 2024 at 13:19, Hans de Goede <hdegoede@redhat.com> wrote:
>
> Hi,
>
> On 2-Dec-24 11:50 AM, Ricardo Ribalda wrote:
> > On Mon, 2 Dec 2024 at 11:27, Hans de Goede <hdegoede@redhat.com> wrote:
> >>
> >> Hi,
> >>
> >> On 2-Dec-24 9:44 AM, Hans Verkuil wrote:
> >>> On 02/12/2024 09:11, Laurent Pinchart wrote:
> >>>> On Mon, Dec 02, 2024 at 09:05:07AM +0100, Hans Verkuil wrote:
> >>>>> On 02/12/2024 01:18, Laurent Pinchart wrote:
> >>>>>> On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
> >>>>>>> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
> >>>>>>>> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
> >>>>>>>>> Before we all go on a well deserved weekend, let me recap what we
> >>>>>>>>> know. If I did not get something correctly, let me know.
> >>>>>>>>>
> >>>>>>>>> 1) Well behaved devices do not allow to set or get an incomplete async
> >>>>>>>>> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
> >>>>>>>>> 2) Both Laurent and Ricardo consider that there is a big chance that
> >>>>>>>>> some camera modules do not implement this properly. (ref: years of
> >>>>>>>>> crying over broken module firmware :) )
> >>>>>>>>>
> >>>>>>>>> 3) ctrl->handle is designed to point to the fh that originated the
> >>>>>>>>> control. So the logic can decide if the originator needs to be
> >>>>>>>>> notified or not. (ref: uvc_ctrl_send_event() )
> >>>>>>>>> 4) Right now we replace the originator in ctrl->handle for unfinished
> >>>>>>>>> async controls.  (ref:
> >>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
> >>>>>>>>>
> >>>>>>>>> My interpretation is that:
> >>>>>>>>> A) We need to change 4). We shall not change the originator of
> >>>>>>>>> unfinished ctrl->handle.
> >>>>>>>>> B) Well behaved cameras do not need the patch "Do not set an async
> >>>>>>>>> control owned by another fh"
> >>>>>>>>> C) For badly behaved cameras, it is fine if we slightly break the
> >>>>>>>>> v4l2-compliance in corner cases, if we do not break any internal data
> >>>>>>>>> structure.
> >>>>>>>>
> >>>>>>>> The fact that some devices may not implement the documented behaviour
> >>>>>>>> correctly may not be a problem. Well-behaved devices will stall, which
> >>>>>>>> means we shouldn't query the device while as async update is in
> >>>>>>>> progress. Badly-behaved devices, whatever they do when queried, should
> >>>>>>>> not cause any issue if we don't query them.
> >>>>>>>
> >>>>>>> I thought we could detect the stall and return safely. Isn't that the case?
> >>>>>>
> >>>>>> We could, but if we know the device will stall anyway, is there a reason
> >>>>>> not to avoid issuing the request in the first place ?
> >>>>>>
> >>>>>>> Why we have not seen issues with this?
> >>>>>>
> >>>>>> I haven't tested a PTZ device for a very long time, and you would need
> >>>>>> to hit a small time window to see the issue.
> >>>>>>
> >>>>>>>> We should not send GET_CUR and SET_CUR requests to the device while an
> >>>>>>>> async update is in progress, and use cached values instead. When we
> >>>>>>>> receive the async update event, we should clear the cache. This will be
> >>>>>>>> the same for both well-behaved and badly-behaved devices, so we can
> >>>>>>>> expose the same behaviour towards userspace.
> >>>>>>>
> >>>>>>> seting ctrl->loaded = 0 when we get an event sounds like a good idea
> >>>>>>> and something we can implement right away.
> >>>>>>> If I have to resend the set I will add it to the end.
> >>>>>>>
> >>>>>>>> We possibly also need some kind of timeout mechanism to cope with the
> >>>>>>>> async update event not being delivered by the device.
> >>>>>>>
> >>>>>>> This is the part that worries me the most:
> >>>>>>> - timeouts make the code fragile
> >>>>>>> - What is a good value for timeout? 1 second, 30, 300? I do not think
> >>>>>>> that we can find a value.
> >>>>>>
> >>>>>> I've been thinking about the implementation of uvc_fh cleanup over the
> >>>>>> weekend, and having a timeout would have the nice advantage that we
> >>>>>> could reference-count uvc_fh instead of implementing a cleanup that
> >>>>>> walks over all controls when closing a file handle. I think it would
> >>>>>> make the code simpler, and possibly safer too.
> >>>>>>
> >>>>>>>> Regarding the userspace behaviour during an auto-update, we have
> >>>>>>>> multiple options:
> >>>>>>>>
> >>>>>>>> For control get,
> >>>>>>>>
> >>>>>>>> - We can return -EBUSY
> >>>>>>>> - We can return the old value from the cache
> >>>>>
> >>>>> This would match the control behavior best. Only when the operation is
> >>>>> done is the control updated and the control event sent.
> >>>>>
> >>>>> Some questions: is any of this documented for UVC? Because this is non-standard
> >>>>
> >>>> No this isn't documented.
> >>>>
> >>>>> behavior. Are there applications that rely on this? Should we perhaps add
> >>>>
> >>>> I don't know.
> >>>>
> >>>>> proper support for this to the control framework? E.g. add an ASYNC flag and
> >>>>> document this?
> >>>>
> >>>> We could, but this is such a specific use case that I don't think is
> >>>> worth adding complexity to the already complex control framework would
> >>>> be worth it. What we could do is perhaps adding a flag for the userspace
> >>>> API, but even there, I never like modelling an API with a single user.
> >>>
> >>> Well, it might be a single driver that uses this, but it is also the most
> >>> used driver by far. I think the only change is to add a flag for this and
> >>> describe how it should behave. And add v4l2-compliance tests for it.
> >>>
> >>> Otherwise no changes to the control framework are needed, I think.
> >>>
> >>> Controls with the ASYNC flag set would:
> >>>
> >>> - return the old value from the cache.
> >>> - document that setting a new value while the operation is in progress
> >>>   results in EBUSY. Document that if the new value is equal to the old value,
> >>>   then return 0 and do nothing (alternative is to just immediately send
> >>>   the control changed event, but that might require a control framework change).
> >>> - when the operation finishes, update the cache to the new value and
> >>>   send the control changed event.
> >>> - document that userspace should specify V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK
> >>>   when subscribing to the control if you calling fh wants to know when
> >>>   the operation finishes.
> >>> - document how timeouts should be handled: this is tricky, especially with
> >>>   bad hardware. I.e. if the hw doesn't send the event, does that mean that
> >>>   you are never able to set the control since it will stall?
> >>>   In the end this will just reflect how UVC handles this.
> >>
> >> I have been catching up on this thread (I have not read the v3 and v4
> >> threads yet).
> >>
> >> This all started with Ricardo noticing that ctrl->handle may get
> >> overwritten when another app sets the ctrl, causing the first app
> >> to set the ctrl to get a V4L2_EVENT for the ctrl (if subscribed)
> >> even though it set the ctrl itself.
> >>
> >> My observations so far:
> >>
> >> 1. This is only hit when another app changes the ctrl after the first app,
> >> in this case, if there is no stall issued by the hw for the second app's
> >> request, arguably the first app getting the event for the ctrl is correct
> >
> > In other words, for non compliant cameras the current behaviour is
> > correct. For compliant cameras it is broken.
> >
> >> since it was changed by the second app. IOW I think the current behavior
> >> is not only fine, but even desirable. Assuming we only override ctrl->handle
> >> after successfully sending the set-ctrl request to the hardware.
> >
> > We are overriding ctrl->handle unconditionally, even if set-ctrl stalls.
>
> Right I was just looking at that. Since we hold chain->ctrl_mutex
> from the start of uvc_ioctl_s_try_ext_ctrls() until we finish
> uvc_ctrl_commit() we can delay setting ctrl->handle until
> a successful commit.
>
> Actually looking at this I think I've found another bug, uvc_ctrl_set()
> always sets ctrl->handle also in the VIDIOC_TRY_EXT_CTRLS case in
> which case we never send the SET_CUR command to the camera.
ack

>
> And the handle is not part of the data backed up / restored by
> a rollback.
>
> Moving the storing of the handle to a successful commit fixes
> both the overriding of the handle in the stall case as well
> as the bogus setting of the handle on VIDIOC_TRY_EXT_CTRLS.
>
> So my suggestion would be not touching ctrl->handle from
> ctrl_set() instead store the handle in a new ctrl->new_handle
> variable there and copy the new_handle values to handle for affected
> controls after a successful commit (before releasing the lock).

We do not need to do that. Instead, we can move the set to
uvc_ctrl_commit_entity().

I have a patchset ready with this change.



>
> This will also successfully replace the handle for buggy
> devices which do not report a stall but instead accept the
> second SET_CUR. This replacement of handle is the correct thing
> todo in this case since after the second SET_CUR is accepted
> there is a change of the ctrl happening from the pov of the
> issuer of the first SET_CUR.
>
> >> 2. This adds a lot of complexity for not sending an event to the app
> >> which made the change. Hans V. suggested maybe adding some sort of flag
> >> for async ctrls to the userspace API. I wonder if we should not just
> >> get rid of this complexity and document that these controls will always
> >> generate events independent of V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK ?
> >> That would certainly simplify things, but it raises the questions if
> >> this will cause issues for existing applications.
> >
> > To be honest, I am more concerned about the dangling pointers than the event.
>
> Ack.
>
> > Updating the doc to say that  ASYC controls always generate events
> > sounds good to me. But until we reach an agreement on the specifics
> > I'd rather land this fix and then we can take time to design an API
> > that works for compliant and non compliant hardware.
>
> I agree that we should focus on fixing the dangling pointer problem
> and your v4 series is heading in the right direction there.
>
> I'm not sure if we should take your v4 series as is though, see above.
>
> At a minimum I think the issue with setting ctrl->handle in
> the VIDIOC_TRY_EXT_CTRLS case needs to be fixed.
>
> >> Note that if we simply return -EBUSY on set until acked by a status
> >> event we also avoid the issue of ctrl->handle getting overwritten,
> >> but that relies on reliable status events; or requires timeout handling.
> >>
> >> 3. I agree with Ricardo that a timeout based approach for cameras which
> >> to not properly send status events for async ctrls is going to be
> >> problematic. Things like pan/tilt homing can take multiple seconds which
> >> is really long to use as a timeout if we plan to return -EBUSY until
> >> the timeout triggers. I think it would be better to just rely on
> >> the hardware sending a stall, or it accepting and correctly handling
> >> a new CUR_SET command while the previous one is still being processed.
> >>
> >> I guess we can track if the hw does send status events when async ctrls
> >> complete and then do the -EBUSY thing without going out to the hw after
> >> the first time an async ctrl has been acked by a status event.
> >>
> >> And then combine that with the current behavior of overwriting ctrl->handle
> >> until the ctrl has been marked as having working status events. So:
> >>
> >> a) In case we do not know yet if a ctrl gets status-event acks; and
> >> on devices without reliable status events keep current behavior.
> >>
> >> b) As soon as we know a ctrl has reliable status events, switch to
> >> returning -EBUSY if a set is pending (as indicated by ctrl->handle
> >> being set).
> >>
> >> I don't like the fact that this changes the behavior after the first
> >> status event acking an async ctrl, but I don't really see another way.
> >
> > If I understood you correctly, you are proposing the following quirk:
> >
> > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> > index f0e8a436a306..1a554afeaa2f 100644
> > --- a/drivers/media/usb/uvc/uvc_ctrl.c
> > +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> > @@ -1132,6 +1132,9 @@ static int __uvc_ctrl_get(struct uvc_video_chain *chain,
> >         if ((ctrl->info.flags & UVC_CTRL_FLAG_GET_CUR) == 0)
> >                 return -EACCES;
> >
> > +       if (ctrl->handle && ctrl->async_event_works)
> > +               return -EBUSY;
> > +
> >         ret = __uvc_ctrl_load_cur(chain, ctrl);
> >         if (ret < 0)
> >                 return ret;
> > @@ -1672,6 +1675,8 @@ bool uvc_ctrl_status_event_async(struct urb> *urb, struct uvc_video_chain *chain,
> >         /* Flush the control cache, the data might have changed. */
> >         ctrl->loaded = 0;
> >
> > +       ctrl->async_event_works = true;
> > +
> >         if (list_empty(&ctrl->info.mappings))
> >                 return false;
> >
> > @@ -1982,6 +1987,9 @@ int uvc_ctrl_set(struct uvc_fh *handle,
> >         if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
> >                 return -EACCES;
> >
> > +       if (ctrl->handle && ctrl->async_event_works)
> > +               return -EBUSY;
> > +
>
> Yes this is what I'm proposing, except that this check should be
> skipped in the VIDIOC_TRY_EXT_CTRLS case. I think we need to add
> a "bool try" parameter to uvc_ctrl_set() and not look at / set
> [new_]handle when this is set.

SGTM. Let's add this improvement later.

>
>
>
> >         /* Clamp out of range values. */
> >         switch (mapping->v4l2_type) {
> >         case V4L2_CTRL_TYPE_INTEGER:
> > diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
> > index e0e4f099a210..0ef7c594eecb 100644
> > --- a/drivers/media/usb/uvc/uvcvideo.h
> > +++ b/drivers/media/usb/uvc/uvcvideo.h
> > @@ -154,6 +154,7 @@ struct uvc_control {
> >                                  * File handle that initially changed the
> >                                  * async control.
> >                                  */
> > +       bool async_event_works;
> >  };
> >
> > The benefit is that we can predict a device returning STALL without
> > having to actually do the set/get operation.
> >
> > We can add it as a follow-up patch.
>
> Another benefit would be correctly returning -EBUSY when trying to get
> the ctrl. I agree this could be done as a follow-up.
>
> Regards,
>
> Hans
>
>
Laurent Pinchart Dec. 3, 2024, 5:18 p.m. UTC | #30
On Mon, Dec 02, 2024 at 11:55:20AM +0100, Hans Verkuil wrote:
> On 02/12/2024 11:26, Hans de Goede wrote:
> > On 2-Dec-24 9:44 AM, Hans Verkuil wrote:
> >> On 02/12/2024 09:11, Laurent Pinchart wrote:
> >>> On Mon, Dec 02, 2024 at 09:05:07AM +0100, Hans Verkuil wrote:
> >>>> On 02/12/2024 01:18, Laurent Pinchart wrote:
> >>>>> On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
> >>>>>> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
> >>>>>>> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
> >>>>>>>> Before we all go on a well deserved weekend, let me recap what we
> >>>>>>>> know. If I did not get something correctly, let me know.
> >>>>>>>>
> >>>>>>>> 1) Well behaved devices do not allow to set or get an incomplete async
> >>>>>>>> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
> >>>>>>>> 2) Both Laurent and Ricardo consider that there is a big chance that
> >>>>>>>> some camera modules do not implement this properly. (ref: years of
> >>>>>>>> crying over broken module firmware :) )
> >>>>>>>>
> >>>>>>>> 3) ctrl->handle is designed to point to the fh that originated the
> >>>>>>>> control. So the logic can decide if the originator needs to be
> >>>>>>>> notified or not. (ref: uvc_ctrl_send_event() )
> >>>>>>>> 4) Right now we replace the originator in ctrl->handle for unfinished
> >>>>>>>> async controls.  (ref:
> >>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
> >>>>>>>>
> >>>>>>>> My interpretation is that:
> >>>>>>>> A) We need to change 4). We shall not change the originator of
> >>>>>>>> unfinished ctrl->handle.
> >>>>>>>> B) Well behaved cameras do not need the patch "Do not set an async
> >>>>>>>> control owned by another fh"
> >>>>>>>> C) For badly behaved cameras, it is fine if we slightly break the
> >>>>>>>> v4l2-compliance in corner cases, if we do not break any internal data
> >>>>>>>> structure.
> >>>>>>>
> >>>>>>> The fact that some devices may not implement the documented behaviour
> >>>>>>> correctly may not be a problem. Well-behaved devices will stall, which
> >>>>>>> means we shouldn't query the device while as async update is in
> >>>>>>> progress. Badly-behaved devices, whatever they do when queried, should
> >>>>>>> not cause any issue if we don't query them.
> >>>>>>
> >>>>>> I thought we could detect the stall and return safely. Isn't that the case?
> >>>>>
> >>>>> We could, but if we know the device will stall anyway, is there a reason
> >>>>> not to avoid issuing the request in the first place ?
> >>>>>
> >>>>>> Why we have not seen issues with this?
> >>>>>
> >>>>> I haven't tested a PTZ device for a very long time, and you would need
> >>>>> to hit a small time window to see the issue.
> >>>>>
> >>>>>>> We should not send GET_CUR and SET_CUR requests to the device while an
> >>>>>>> async update is in progress, and use cached values instead. When we
> >>>>>>> receive the async update event, we should clear the cache. This will be
> >>>>>>> the same for both well-behaved and badly-behaved devices, so we can
> >>>>>>> expose the same behaviour towards userspace.
> >>>>>>
> >>>>>> seting ctrl->loaded = 0 when we get an event sounds like a good idea
> >>>>>> and something we can implement right away.
> >>>>>> If I have to resend the set I will add it to the end.
> >>>>>>
> >>>>>>> We possibly also need some kind of timeout mechanism to cope with the
> >>>>>>> async update event not being delivered by the device.
> >>>>>>
> >>>>>> This is the part that worries me the most:
> >>>>>> - timeouts make the code fragile
> >>>>>> - What is a good value for timeout? 1 second, 30, 300? I do not think
> >>>>>> that we can find a value.
> >>>>>
> >>>>> I've been thinking about the implementation of uvc_fh cleanup over the
> >>>>> weekend, and having a timeout would have the nice advantage that we
> >>>>> could reference-count uvc_fh instead of implementing a cleanup that
> >>>>> walks over all controls when closing a file handle. I think it would
> >>>>> make the code simpler, and possibly safer too.
> >>>>>
> >>>>>>> Regarding the userspace behaviour during an auto-update, we have
> >>>>>>> multiple options:
> >>>>>>>
> >>>>>>> For control get,
> >>>>>>>
> >>>>>>> - We can return -EBUSY
> >>>>>>> - We can return the old value from the cache
> >>>>
> >>>> This would match the control behavior best. Only when the operation is
> >>>> done is the control updated and the control event sent.
> >>>>
> >>>> Some questions: is any of this documented for UVC? Because this is non-standard
> >>>
> >>> No this isn't documented.
> >>>
> >>>> behavior. Are there applications that rely on this? Should we perhaps add
> >>>
> >>> I don't know.
> >>>
> >>>> proper support for this to the control framework? E.g. add an ASYNC flag and
> >>>> document this?
> >>>
> >>> We could, but this is such a specific use case that I don't think is
> >>> worth adding complexity to the already complex control framework would
> >>> be worth it. What we could do is perhaps adding a flag for the userspace
> >>> API, but even there, I never like modelling an API with a single user.
> >>
> >> Well, it might be a single driver that uses this, but it is also the most
> >> used driver by far. I think the only change is to add a flag for this and
> >> describe how it should behave. And add v4l2-compliance tests for it.
> >>
> >> Otherwise no changes to the control framework are needed, I think.
> >>
> >> Controls with the ASYNC flag set would:
> >>
> >> - return the old value from the cache.
> >> - document that setting a new value while the operation is in progress
> >>   results in EBUSY. Document that if the new value is equal to the old value,
> >>   then return 0 and do nothing (alternative is to just immediately send
> >>   the control changed event, but that might require a control framework change).
> >> - when the operation finishes, update the cache to the new value and
> >>   send the control changed event.
> >> - document that userspace should specify V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK
> >>   when subscribing to the control if you calling fh wants to know when
> >>   the operation finishes.
> >> - document how timeouts should be handled: this is tricky, especially with
> >>   bad hardware. I.e. if the hw doesn't send the event, does that mean that
> >>   you are never able to set the control since it will stall?
> >>   In the end this will just reflect how UVC handles this.
> > 
> > I have been catching up on this thread (I have not read the v3 and v4
> > threads yet).
> > 
> > This all started with Ricardo noticing that ctrl->handle may get
> > overwritten when another app sets the ctrl, causing the first app
> > to set the ctrl to get a V4L2_EVENT for the ctrl (if subscribed)
> > even though it set the ctrl itself.
> > 
> > My observations so far:
> > 
> > 1. This is only hit when another app changes the ctrl after the first app,
> > in this case, if there is no stall issued by the hw for the second app's
> > request, arguably the first app getting the event for the ctrl is correct
> > since it was changed by the second app. IOW I think the current behavior
> > is not only fine, but even desirable. Assuming we only override ctrl->handle
> > after successfully sending the set-ctrl request to the hardware.

I think you're right.

> > 2. This adds a lot of complexity for not sending an event to the app
> > which made the change. Hans V. suggested maybe adding some sort of flag
> > for async ctrls to the userspace API. I wonder if we should not just
> > get rid of this complexity and document that these controls will always
> > generate events independent of V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK ?
> > That would certainly simplify things, but it raises the questions if
> > this will cause issues for existing applications.
> 
> I'm not that keen on this. That's why a new flag can come in handy since
> if present, then that indicates that it makes sense to specify
> V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK when subscribing to the control events.
> 
> This ensures that uvc follows the current v4l2 spec. It's also why I
> prefer that g_ctrl will just return the old value until the new value
> has been reached: that way the control event corresponds with the actual
> updating of the control value.
> 
> That said, it's just my opinion and I am OK with UVC doing things a bit
> differently. Just be aware that I have no objection to adding an ASYNC flag,
> given how widely UVC is used.

My experience with the V4L2 control API is that we've overdesigned quite
a few things, and in particular control events. The independent
"capture" and "control panel" application model that V4L2 controls were
designed for is not really a good fit for the 21st century anymore. The
V4L2 API isn't rich enough to arbitrate between applications that are
not designed to collaborate, and way too rich when applications do
collaborate. The only two real use cases I found for control events are
async set completion notification, and notification of automatic changes
to other controls (and in particular changes to control limits) when a
control is set. The second use case isn't even something that we support
well today: to make it really usable, the change notification should be
*synchronous* with the control set ioctl, returning the information from
the same ioctl, not through asynchronous control events.

TL;DR: If I can pick an option, let's make things simpler, not more
complex.

> > Note that if we simply return -EBUSY on set until acked by a status
> > event we also avoid the issue of ctrl->handle getting overwritten,
> > but that relies on reliable status events; or requires timeout handling.
> > 
> > 3. I agree with Ricardo that a timeout based approach for cameras which
> > to not properly send status events for async ctrls is going to be
> > problematic. Things like pan/tilt homing can take multiple seconds which
> > is really long to use as a timeout if we plan to return -EBUSY until
> > the timeout triggers. I think it would be better to just rely on
> > the hardware sending a stall, or it accepting and correctly handling
> > a new CUR_SET command while the previous one is still being processed.
> > 
> > I guess we can track if the hw does send status events when async ctrls
> > complete and then do the -EBUSY thing without going out to the hw after
> > the first time an async ctrl has been acked by a status event.
> > 
> > And then combine that with the current behavior of overwriting ctrl->handle
> > until the ctrl has been marked as having working status events. So:
> > 
> > a) In case we do not know yet if a ctrl gets status-event acks; and
> > on devices without reliable status events keep current behavior.
> > 
> > b) As soon as we know a ctrl has reliable status events, switch to
> > returning -EBUSY if a set is pending (as indicated by ctrl->handle
> > being set).
> > 
> > I don't like the fact that this changes the behavior after the first
> > status event acking an async ctrl, but I don't really see another way.
> > 
> >>>>>>> - We can return the new value fromt he cache
> >>>>>>>
> >>>>>>> Returning -EBUSY would be simpler to implement.
> >>>>>>
> >>>>>> Not only easy, I think it is the most correct,
> >>>>>>
> >>>>>>> I don't think the behaviour should depend on whether the control is read
> >>>>>>> on the file handle that initiated the async operation or on a different
> >>>>>>> file handle.
> >>>>>>>
> >>>>>>> For control set, I don't think we can do much else than returning
> >>>>>>> -EBUSY, regardless of which file handle the control is set on.
> >>>>>>
> >>>>>> ACK.
> >>>>>>
> >>>>>>>> I will send a new version with my interpretation.
> >>>>>>>>
> >>>>>>>> Thanks for a great discussion
> >>>>>>
> >>>>>> Looking with some perspective... I believe that we should look into
> >>>>>> the "userspace behaviour for auto controls" in a different patchset.
> >>>>>> It is slightly unrelated to this discussion.
Laurent Pinchart Dec. 3, 2024, 7:32 p.m. UTC | #31
On Mon, Dec 02, 2024 at 02:29:24PM +0100, Ricardo Ribalda wrote:
> On Mon, 2 Dec 2024 at 13:19, Hans de Goede wrote:
> > On 2-Dec-24 11:50 AM, Ricardo Ribalda wrote:
> > > On Mon, 2 Dec 2024 at 11:27, Hans de Goede wrote:
> > >> On 2-Dec-24 9:44 AM, Hans Verkuil wrote:
> > >>> On 02/12/2024 09:11, Laurent Pinchart wrote:
> > >>>> On Mon, Dec 02, 2024 at 09:05:07AM +0100, Hans Verkuil wrote:
> > >>>>> On 02/12/2024 01:18, Laurent Pinchart wrote:
> > >>>>>> On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
> > >>>>>>> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
> > >>>>>>>> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
> > >>>>>>>>> Before we all go on a well deserved weekend, let me recap what we
> > >>>>>>>>> know. If I did not get something correctly, let me know.
> > >>>>>>>>>
> > >>>>>>>>> 1) Well behaved devices do not allow to set or get an incomplete async
> > >>>>>>>>> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
> > >>>>>>>>> 2) Both Laurent and Ricardo consider that there is a big chance that
> > >>>>>>>>> some camera modules do not implement this properly. (ref: years of
> > >>>>>>>>> crying over broken module firmware :) )
> > >>>>>>>>>
> > >>>>>>>>> 3) ctrl->handle is designed to point to the fh that originated the
> > >>>>>>>>> control. So the logic can decide if the originator needs to be
> > >>>>>>>>> notified or not. (ref: uvc_ctrl_send_event() )
> > >>>>>>>>> 4) Right now we replace the originator in ctrl->handle for unfinished
> > >>>>>>>>> async controls.  (ref:
> > >>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
> > >>>>>>>>>
> > >>>>>>>>> My interpretation is that:
> > >>>>>>>>> A) We need to change 4). We shall not change the originator of
> > >>>>>>>>> unfinished ctrl->handle.
> > >>>>>>>>> B) Well behaved cameras do not need the patch "Do not set an async
> > >>>>>>>>> control owned by another fh"
> > >>>>>>>>> C) For badly behaved cameras, it is fine if we slightly break the
> > >>>>>>>>> v4l2-compliance in corner cases, if we do not break any internal data
> > >>>>>>>>> structure.
> > >>>>>>>>
> > >>>>>>>> The fact that some devices may not implement the documented behaviour
> > >>>>>>>> correctly may not be a problem. Well-behaved devices will stall, which
> > >>>>>>>> means we shouldn't query the device while as async update is in
> > >>>>>>>> progress. Badly-behaved devices, whatever they do when queried, should
> > >>>>>>>> not cause any issue if we don't query them.
> > >>>>>>>
> > >>>>>>> I thought we could detect the stall and return safely. Isn't that the case?
> > >>>>>>
> > >>>>>> We could, but if we know the device will stall anyway, is there a reason
> > >>>>>> not to avoid issuing the request in the first place ?
> > >>>>>>
> > >>>>>>> Why we have not seen issues with this?
> > >>>>>>
> > >>>>>> I haven't tested a PTZ device for a very long time, and you would need
> > >>>>>> to hit a small time window to see the issue.
> > >>>>>>
> > >>>>>>>> We should not send GET_CUR and SET_CUR requests to the device while an
> > >>>>>>>> async update is in progress, and use cached values instead. When we
> > >>>>>>>> receive the async update event, we should clear the cache. This will be
> > >>>>>>>> the same for both well-behaved and badly-behaved devices, so we can
> > >>>>>>>> expose the same behaviour towards userspace.
> > >>>>>>>
> > >>>>>>> seting ctrl->loaded = 0 when we get an event sounds like a good idea
> > >>>>>>> and something we can implement right away.
> > >>>>>>> If I have to resend the set I will add it to the end.
> > >>>>>>>
> > >>>>>>>> We possibly also need some kind of timeout mechanism to cope with the
> > >>>>>>>> async update event not being delivered by the device.
> > >>>>>>>
> > >>>>>>> This is the part that worries me the most:
> > >>>>>>> - timeouts make the code fragile
> > >>>>>>> - What is a good value for timeout? 1 second, 30, 300? I do not think
> > >>>>>>> that we can find a value.
> > >>>>>>
> > >>>>>> I've been thinking about the implementation of uvc_fh cleanup over the
> > >>>>>> weekend, and having a timeout would have the nice advantage that we
> > >>>>>> could reference-count uvc_fh instead of implementing a cleanup that
> > >>>>>> walks over all controls when closing a file handle. I think it would
> > >>>>>> make the code simpler, and possibly safer too.
> > >>>>>>
> > >>>>>>>> Regarding the userspace behaviour during an auto-update, we have
> > >>>>>>>> multiple options:
> > >>>>>>>>
> > >>>>>>>> For control get,
> > >>>>>>>>
> > >>>>>>>> - We can return -EBUSY
> > >>>>>>>> - We can return the old value from the cache
> > >>>>>
> > >>>>> This would match the control behavior best. Only when the operation is
> > >>>>> done is the control updated and the control event sent.
> > >>>>>
> > >>>>> Some questions: is any of this documented for UVC? Because this is non-standard
> > >>>>
> > >>>> No this isn't documented.
> > >>>>
> > >>>>> behavior. Are there applications that rely on this? Should we perhaps add
> > >>>>
> > >>>> I don't know.
> > >>>>
> > >>>>> proper support for this to the control framework? E.g. add an ASYNC flag and
> > >>>>> document this?
> > >>>>
> > >>>> We could, but this is such a specific use case that I don't think is
> > >>>> worth adding complexity to the already complex control framework would
> > >>>> be worth it. What we could do is perhaps adding a flag for the userspace
> > >>>> API, but even there, I never like modelling an API with a single user.
> > >>>
> > >>> Well, it might be a single driver that uses this, but it is also the most
> > >>> used driver by far. I think the only change is to add a flag for this and
> > >>> describe how it should behave. And add v4l2-compliance tests for it.
> > >>>
> > >>> Otherwise no changes to the control framework are needed, I think.
> > >>>
> > >>> Controls with the ASYNC flag set would:
> > >>>
> > >>> - return the old value from the cache.
> > >>> - document that setting a new value while the operation is in progress
> > >>>   results in EBUSY. Document that if the new value is equal to the old value,
> > >>>   then return 0 and do nothing (alternative is to just immediately send
> > >>>   the control changed event, but that might require a control framework change).
> > >>> - when the operation finishes, update the cache to the new value and
> > >>>   send the control changed event.
> > >>> - document that userspace should specify V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK
> > >>>   when subscribing to the control if you calling fh wants to know when
> > >>>   the operation finishes.
> > >>> - document how timeouts should be handled: this is tricky, especially with
> > >>>   bad hardware. I.e. if the hw doesn't send the event, does that mean that
> > >>>   you are never able to set the control since it will stall?
> > >>>   In the end this will just reflect how UVC handles this.
> > >>
> > >> I have been catching up on this thread (I have not read the v3 and v4
> > >> threads yet).
> > >>
> > >> This all started with Ricardo noticing that ctrl->handle may get
> > >> overwritten when another app sets the ctrl, causing the first app
> > >> to set the ctrl to get a V4L2_EVENT for the ctrl (if subscribed)
> > >> even though it set the ctrl itself.
> > >>
> > >> My observations so far:
> > >>
> > >> 1. This is only hit when another app changes the ctrl after the first app,
> > >> in this case, if there is no stall issued by the hw for the second app's
> > >> request, arguably the first app getting the event for the ctrl is correct
> > >
> > > In other words, for non compliant cameras the current behaviour is
> > > correct. For compliant cameras it is broken.
> > >
> > >> since it was changed by the second app. IOW I think the current behavior
> > >> is not only fine, but even desirable. Assuming we only override ctrl->handle
> > >> after successfully sending the set-ctrl request to the hardware.
> > >
> > > We are overriding ctrl->handle unconditionally, even if set-ctrl stalls.
> >
> > Right I was just looking at that. Since we hold chain->ctrl_mutex
> > from the start of uvc_ioctl_s_try_ext_ctrls() until we finish
> > uvc_ctrl_commit() we can delay setting ctrl->handle until
> > a successful commit.
> >
> > Actually looking at this I think I've found another bug, uvc_ctrl_set()
> > always sets ctrl->handle also in the VIDIOC_TRY_EXT_CTRLS case in
> > which case we never send the SET_CUR command to the camera.
>
> ack
> 
> > And the handle is not part of the data backed up / restored by
> > a rollback.
> >
> > Moving the storing of the handle to a successful commit fixes
> > both the overriding of the handle in the stall case as well
> > as the bogus setting of the handle on VIDIOC_TRY_EXT_CTRLS.
> >
> > So my suggestion would be not touching ctrl->handle from
> > ctrl_set() instead store the handle in a new ctrl->new_handle
> > variable there and copy the new_handle values to handle for affected
> > controls after a successful commit (before releasing the lock).
> 
> We do not need to do that. Instead, we can move the set to
> uvc_ctrl_commit_entity().
> 
> I have a patchset ready with this change.

I find it fascinating that we're spending so much time discussing this
to implement the control events API, which is both complex and not very
useful in its current form :-) If I could, I'd drop all control events
except for async controls, and wouldn't support not setting
V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK for those. But that's a digression.

> > This will also successfully replace the handle for buggy
> > devices which do not report a stall but instead accept the
> > second SET_CUR. This replacement of handle is the correct thing
> > todo in this case since after the second SET_CUR is accepted
> > there is a change of the ctrl happening from the pov of the
> > issuer of the first SET_CUR.
> >
> > >> 2. This adds a lot of complexity for not sending an event to the app
> > >> which made the change. Hans V. suggested maybe adding some sort of flag
> > >> for async ctrls to the userspace API. I wonder if we should not just
> > >> get rid of this complexity and document that these controls will always
> > >> generate events independent of V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK ?
> > >> That would certainly simplify things, but it raises the questions if
> > >> this will cause issues for existing applications.
> > >
> > > To be honest, I am more concerned about the dangling pointers than the event.
> >
> > Ack.
> >
> > > Updating the doc to say that  ASYC controls always generate events
> > > sounds good to me. But until we reach an agreement on the specifics
> > > I'd rather land this fix and then we can take time to design an API
> > > that works for compliant and non compliant hardware.
> >
> > I agree that we should focus on fixing the dangling pointer problem
> > and your v4 series is heading in the right direction there.
> >
> > I'm not sure if we should take your v4 series as is though, see above.
> >
> > At a minimum I think the issue with setting ctrl->handle in
> > the VIDIOC_TRY_EXT_CTRLS case needs to be fixed.

We have three clear issues that are not related to changing behaviour of
the async controls (aside from side effects of the fixes):

- Not setting ctrl->handle when trying controls
- Not setting ctrl->handle when setting a control fails
- Fixing the dangling pointer bug

The first two issues can probably be fixed with a single patch, and the
third issue with a second patch. I don't mind much about the order,
whatever makes development easier should be fine. We should backport
both patches in any case.

> > >> Note that if we simply return -EBUSY on set until acked by a status
> > >> event we also avoid the issue of ctrl->handle getting overwritten,
> > >> but that relies on reliable status events; or requires timeout handling.
> > >>
> > >> 3. I agree with Ricardo that a timeout based approach for cameras which
> > >> to not properly send status events for async ctrls is going to be
> > >> problematic. Things like pan/tilt homing can take multiple seconds which
> > >> is really long to use as a timeout if we plan to return -EBUSY until
> > >> the timeout triggers. I think it would be better to just rely on
> > >> the hardware sending a stall, or it accepting and correctly handling
> > >> a new CUR_SET command while the previous one is still being processed.
> > >>
> > >> I guess we can track if the hw does send status events when async ctrls
> > >> complete and then do the -EBUSY thing without going out to the hw after
> > >> the first time an async ctrl has been acked by a status event.

That sounds quite complex, and wouldn't guard against the status event
being occasionally lost. I'm more concerned about devices that only
occasionally mess up sending the status event, not devices that never
send it.

> > >>
> > >> And then combine that with the current behavior of overwriting ctrl->handle
> > >> until the ctrl has been marked as having working status events. So:
> > >>
> > >> a) In case we do not know yet if a ctrl gets status-event acks; and
> > >> on devices without reliable status events keep current behavior.
> > >>
> > >> b) As soon as we know a ctrl has reliable status events, switch to
> > >> returning -EBUSY if a set is pending (as indicated by ctrl->handle
> > >> being set).
> > >>
> > >> I don't like the fact that this changes the behavior after the first
> > >> status event acking an async ctrl, but I don't really see another way.
> > >
> > > If I understood you correctly, you are proposing the following quirk:
> > >
> > > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
> > > index f0e8a436a306..1a554afeaa2f 100644
> > > --- a/drivers/media/usb/uvc/uvc_ctrl.c
> > > +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> > > @@ -1132,6 +1132,9 @@ static int __uvc_ctrl_get(struct uvc_video_chain *chain,
> > >         if ((ctrl->info.flags & UVC_CTRL_FLAG_GET_CUR) == 0)
> > >                 return -EACCES;
> > >
> > > +       if (ctrl->handle && ctrl->async_event_works)
> > > +               return -EBUSY;
> > > +
> > >         ret = __uvc_ctrl_load_cur(chain, ctrl);
> > >         if (ret < 0)
> > >                 return ret;
> > > @@ -1672,6 +1675,8 @@ bool uvc_ctrl_status_event_async(struct urb> *urb, struct uvc_video_chain *chain,
> > >         /* Flush the control cache, the data might have changed. */
> > >         ctrl->loaded = 0;
> > >
> > > +       ctrl->async_event_works = true;
> > > +
> > >         if (list_empty(&ctrl->info.mappings))
> > >                 return false;
> > >
> > > @@ -1982,6 +1987,9 @@ int uvc_ctrl_set(struct uvc_fh *handle,
> > >         if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
> > >                 return -EACCES;
> > >
> > > +       if (ctrl->handle && ctrl->async_event_works)
> > > +               return -EBUSY;
> > > +
> >
> > Yes this is what I'm proposing, except that this check should be
> > skipped in the VIDIOC_TRY_EXT_CTRLS case. I think we need to add
> > a "bool try" parameter to uvc_ctrl_set() and not look at / set
> > [new_]handle when this is set.
> 
> SGTM. Let's add this improvement later.

Let's handle behavioural changes, optimizations, and handling of broken
devices on top of the fixes, it's getting complex enough as-is.

> > >         /* Clamp out of range values. */
> > >         switch (mapping->v4l2_type) {
> > >         case V4L2_CTRL_TYPE_INTEGER:
> > > diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h
> > > index e0e4f099a210..0ef7c594eecb 100644
> > > --- a/drivers/media/usb/uvc/uvcvideo.h
> > > +++ b/drivers/media/usb/uvc/uvcvideo.h
> > > @@ -154,6 +154,7 @@ struct uvc_control {
> > >                                  * File handle that initially changed the
> > >                                  * async control.
> > >                                  */
> > > +       bool async_event_works;
> > >  };
> > >
> > > The benefit is that we can predict a device returning STALL without
> > > having to actually do the set/get operation.
> > >
> > > We can add it as a follow-up patch.
> >
> > Another benefit would be correctly returning -EBUSY when trying to get
> > the ctrl. I agree this could be done as a follow-up.
Hans Verkuil Dec. 4, 2024, 7:59 a.m. UTC | #32
On 03/12/2024 18:18, Laurent Pinchart wrote:
> On Mon, Dec 02, 2024 at 11:55:20AM +0100, Hans Verkuil wrote:
>> On 02/12/2024 11:26, Hans de Goede wrote:
>>> On 2-Dec-24 9:44 AM, Hans Verkuil wrote:
>>>> On 02/12/2024 09:11, Laurent Pinchart wrote:
>>>>> On Mon, Dec 02, 2024 at 09:05:07AM +0100, Hans Verkuil wrote:
>>>>>> On 02/12/2024 01:18, Laurent Pinchart wrote:
>>>>>>> On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
>>>>>>>> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
>>>>>>>>> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
>>>>>>>>>> Before we all go on a well deserved weekend, let me recap what we
>>>>>>>>>> know. If I did not get something correctly, let me know.
>>>>>>>>>>
>>>>>>>>>> 1) Well behaved devices do not allow to set or get an incomplete async
>>>>>>>>>> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
>>>>>>>>>> 2) Both Laurent and Ricardo consider that there is a big chance that
>>>>>>>>>> some camera modules do not implement this properly. (ref: years of
>>>>>>>>>> crying over broken module firmware :) )
>>>>>>>>>>
>>>>>>>>>> 3) ctrl->handle is designed to point to the fh that originated the
>>>>>>>>>> control. So the logic can decide if the originator needs to be
>>>>>>>>>> notified or not. (ref: uvc_ctrl_send_event() )
>>>>>>>>>> 4) Right now we replace the originator in ctrl->handle for unfinished
>>>>>>>>>> async controls.  (ref:
>>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
>>>>>>>>>>
>>>>>>>>>> My interpretation is that:
>>>>>>>>>> A) We need to change 4). We shall not change the originator of
>>>>>>>>>> unfinished ctrl->handle.
>>>>>>>>>> B) Well behaved cameras do not need the patch "Do not set an async
>>>>>>>>>> control owned by another fh"
>>>>>>>>>> C) For badly behaved cameras, it is fine if we slightly break the
>>>>>>>>>> v4l2-compliance in corner cases, if we do not break any internal data
>>>>>>>>>> structure.
>>>>>>>>>
>>>>>>>>> The fact that some devices may not implement the documented behaviour
>>>>>>>>> correctly may not be a problem. Well-behaved devices will stall, which
>>>>>>>>> means we shouldn't query the device while as async update is in
>>>>>>>>> progress. Badly-behaved devices, whatever they do when queried, should
>>>>>>>>> not cause any issue if we don't query them.
>>>>>>>>
>>>>>>>> I thought we could detect the stall and return safely. Isn't that the case?
>>>>>>>
>>>>>>> We could, but if we know the device will stall anyway, is there a reason
>>>>>>> not to avoid issuing the request in the first place ?
>>>>>>>
>>>>>>>> Why we have not seen issues with this?
>>>>>>>
>>>>>>> I haven't tested a PTZ device for a very long time, and you would need
>>>>>>> to hit a small time window to see the issue.
>>>>>>>
>>>>>>>>> We should not send GET_CUR and SET_CUR requests to the device while an
>>>>>>>>> async update is in progress, and use cached values instead. When we
>>>>>>>>> receive the async update event, we should clear the cache. This will be
>>>>>>>>> the same for both well-behaved and badly-behaved devices, so we can
>>>>>>>>> expose the same behaviour towards userspace.
>>>>>>>>
>>>>>>>> seting ctrl->loaded = 0 when we get an event sounds like a good idea
>>>>>>>> and something we can implement right away.
>>>>>>>> If I have to resend the set I will add it to the end.
>>>>>>>>
>>>>>>>>> We possibly also need some kind of timeout mechanism to cope with the
>>>>>>>>> async update event not being delivered by the device.
>>>>>>>>
>>>>>>>> This is the part that worries me the most:
>>>>>>>> - timeouts make the code fragile
>>>>>>>> - What is a good value for timeout? 1 second, 30, 300? I do not think
>>>>>>>> that we can find a value.
>>>>>>>
>>>>>>> I've been thinking about the implementation of uvc_fh cleanup over the
>>>>>>> weekend, and having a timeout would have the nice advantage that we
>>>>>>> could reference-count uvc_fh instead of implementing a cleanup that
>>>>>>> walks over all controls when closing a file handle. I think it would
>>>>>>> make the code simpler, and possibly safer too.
>>>>>>>
>>>>>>>>> Regarding the userspace behaviour during an auto-update, we have
>>>>>>>>> multiple options:
>>>>>>>>>
>>>>>>>>> For control get,
>>>>>>>>>
>>>>>>>>> - We can return -EBUSY
>>>>>>>>> - We can return the old value from the cache
>>>>>>
>>>>>> This would match the control behavior best. Only when the operation is
>>>>>> done is the control updated and the control event sent.
>>>>>>
>>>>>> Some questions: is any of this documented for UVC? Because this is non-standard
>>>>>
>>>>> No this isn't documented.
>>>>>
>>>>>> behavior. Are there applications that rely on this? Should we perhaps add
>>>>>
>>>>> I don't know.
>>>>>
>>>>>> proper support for this to the control framework? E.g. add an ASYNC flag and
>>>>>> document this?
>>>>>
>>>>> We could, but this is such a specific use case that I don't think is
>>>>> worth adding complexity to the already complex control framework would
>>>>> be worth it. What we could do is perhaps adding a flag for the userspace
>>>>> API, but even there, I never like modelling an API with a single user.
>>>>
>>>> Well, it might be a single driver that uses this, but it is also the most
>>>> used driver by far. I think the only change is to add a flag for this and
>>>> describe how it should behave. And add v4l2-compliance tests for it.
>>>>
>>>> Otherwise no changes to the control framework are needed, I think.
>>>>
>>>> Controls with the ASYNC flag set would:
>>>>
>>>> - return the old value from the cache.
>>>> - document that setting a new value while the operation is in progress
>>>>   results in EBUSY. Document that if the new value is equal to the old value,
>>>>   then return 0 and do nothing (alternative is to just immediately send
>>>>   the control changed event, but that might require a control framework change).
>>>> - when the operation finishes, update the cache to the new value and
>>>>   send the control changed event.
>>>> - document that userspace should specify V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK
>>>>   when subscribing to the control if you calling fh wants to know when
>>>>   the operation finishes.
>>>> - document how timeouts should be handled: this is tricky, especially with
>>>>   bad hardware. I.e. if the hw doesn't send the event, does that mean that
>>>>   you are never able to set the control since it will stall?
>>>>   In the end this will just reflect how UVC handles this.
>>>
>>> I have been catching up on this thread (I have not read the v3 and v4
>>> threads yet).
>>>
>>> This all started with Ricardo noticing that ctrl->handle may get
>>> overwritten when another app sets the ctrl, causing the first app
>>> to set the ctrl to get a V4L2_EVENT for the ctrl (if subscribed)
>>> even though it set the ctrl itself.
>>>
>>> My observations so far:
>>>
>>> 1. This is only hit when another app changes the ctrl after the first app,
>>> in this case, if there is no stall issued by the hw for the second app's
>>> request, arguably the first app getting the event for the ctrl is correct
>>> since it was changed by the second app. IOW I think the current behavior
>>> is not only fine, but even desirable. Assuming we only override ctrl->handle
>>> after successfully sending the set-ctrl request to the hardware.
> 
> I think you're right.
> 
>>> 2. This adds a lot of complexity for not sending an event to the app
>>> which made the change. Hans V. suggested maybe adding some sort of flag
>>> for async ctrls to the userspace API. I wonder if we should not just
>>> get rid of this complexity and document that these controls will always
>>> generate events independent of V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK ?
>>> That would certainly simplify things, but it raises the questions if
>>> this will cause issues for existing applications.
>>
>> I'm not that keen on this. That's why a new flag can come in handy since
>> if present, then that indicates that it makes sense to specify
>> V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK when subscribing to the control events.
>>
>> This ensures that uvc follows the current v4l2 spec. It's also why I
>> prefer that g_ctrl will just return the old value until the new value
>> has been reached: that way the control event corresponds with the actual
>> updating of the control value.
>>
>> That said, it's just my opinion and I am OK with UVC doing things a bit
>> differently. Just be aware that I have no objection to adding an ASYNC flag,
>> given how widely UVC is used.
> 
> My experience with the V4L2 control API is that we've overdesigned quite
> a few things, and in particular control events. The independent
> "capture" and "control panel" application model that V4L2 controls were
> designed for is not really a good fit for the 21st century anymore. The
> V4L2 API isn't rich enough to arbitrate between applications that are
> not designed to collaborate, and way too rich when applications do
> collaborate. The only two real use cases I found for control events are
> async set completion notification, and notification of automatic changes
> to other controls (and in particular changes to control limits) when a
> control is set. The second use case isn't even something that we support
> well today: to make it really usable, the change notification should be
> *synchronous* with the control set ioctl, returning the information from
> the same ioctl, not through asynchronous control events.

The main reason you think it is complicated is because the uvc driver does
not use the control framework, so it has to copy all the logic in the driver.
That's very painful. Ideally, uvc should use the control framework, but that
would require a complete overhaul of the uvc driver.

For all other drivers the complexity is zero since it is all in the framework.

Some of the Digital Video controls (HOTPLUG, EDID_PRESENT, RXSENSE,
POWER_PRESENT) are meant to be used with control events to inform the applications
when these things change. But you don't deal with HDMI video, so you never see
them in use. The control event mechanism is generic, i.e. available for all
controls. So the use in control panels is just one use-case and it is probably
just qv4l2 that implements it. But control events are great for anything that
happens asynchronously.

What is missing is support for asynchronous event like the zoom control that
takes time to finish the operation. Ideally I would prefer that it would operate
like the V4L2_CID_AUTO_FOCUS_* controls. But since the current mechanism is
already in use in UVC I am fine with the current uvc approach. I just think
this is something that should be signaled to userspace by a flag and that it
is properly documented.

Regarding the second use case: it's perfectly doable, but it would require a
new ioctl. You would need really good arguments for doing that.

Regards,

	Hans

> 
> TL;DR: If I can pick an option, let's make things simpler, not more
> complex.
> 
>>> Note that if we simply return -EBUSY on set until acked by a status
>>> event we also avoid the issue of ctrl->handle getting overwritten,
>>> but that relies on reliable status events; or requires timeout handling.
>>>
>>> 3. I agree with Ricardo that a timeout based approach for cameras which
>>> to not properly send status events for async ctrls is going to be
>>> problematic. Things like pan/tilt homing can take multiple seconds which
>>> is really long to use as a timeout if we plan to return -EBUSY until
>>> the timeout triggers. I think it would be better to just rely on
>>> the hardware sending a stall, or it accepting and correctly handling
>>> a new CUR_SET command while the previous one is still being processed.
>>>
>>> I guess we can track if the hw does send status events when async ctrls
>>> complete and then do the -EBUSY thing without going out to the hw after
>>> the first time an async ctrl has been acked by a status event.
>>>
>>> And then combine that with the current behavior of overwriting ctrl->handle
>>> until the ctrl has been marked as having working status events. So:
>>>
>>> a) In case we do not know yet if a ctrl gets status-event acks; and
>>> on devices without reliable status events keep current behavior.
>>>
>>> b) As soon as we know a ctrl has reliable status events, switch to
>>> returning -EBUSY if a set is pending (as indicated by ctrl->handle
>>> being set).
>>>
>>> I don't like the fact that this changes the behavior after the first
>>> status event acking an async ctrl, but I don't really see another way.
>>>
>>>>>>>>> - We can return the new value fromt he cache
>>>>>>>>>
>>>>>>>>> Returning -EBUSY would be simpler to implement.
>>>>>>>>
>>>>>>>> Not only easy, I think it is the most correct,
>>>>>>>>
>>>>>>>>> I don't think the behaviour should depend on whether the control is read
>>>>>>>>> on the file handle that initiated the async operation or on a different
>>>>>>>>> file handle.
>>>>>>>>>
>>>>>>>>> For control set, I don't think we can do much else than returning
>>>>>>>>> -EBUSY, regardless of which file handle the control is set on.
>>>>>>>>
>>>>>>>> ACK.
>>>>>>>>
>>>>>>>>>> I will send a new version with my interpretation.
>>>>>>>>>>
>>>>>>>>>> Thanks for a great discussion
>>>>>>>>
>>>>>>>> Looking with some perspective... I believe that we should look into
>>>>>>>> the "userspace behaviour for auto controls" in a different patchset.
>>>>>>>> It is slightly unrelated to this discussion.
>
Hans Verkuil Dec. 4, 2024, 8:04 a.m. UTC | #33
On 04/12/2024 08:59, Hans Verkuil wrote:
> On 03/12/2024 18:18, Laurent Pinchart wrote:
>> On Mon, Dec 02, 2024 at 11:55:20AM +0100, Hans Verkuil wrote:
>>> On 02/12/2024 11:26, Hans de Goede wrote:
>>>> On 2-Dec-24 9:44 AM, Hans Verkuil wrote:
>>>>> On 02/12/2024 09:11, Laurent Pinchart wrote:
>>>>>> On Mon, Dec 02, 2024 at 09:05:07AM +0100, Hans Verkuil wrote:
>>>>>>> On 02/12/2024 01:18, Laurent Pinchart wrote:
>>>>>>>> On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
>>>>>>>>> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
>>>>>>>>>> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
>>>>>>>>>>> Before we all go on a well deserved weekend, let me recap what we
>>>>>>>>>>> know. If I did not get something correctly, let me know.
>>>>>>>>>>>
>>>>>>>>>>> 1) Well behaved devices do not allow to set or get an incomplete async
>>>>>>>>>>> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
>>>>>>>>>>> 2) Both Laurent and Ricardo consider that there is a big chance that
>>>>>>>>>>> some camera modules do not implement this properly. (ref: years of
>>>>>>>>>>> crying over broken module firmware :) )
>>>>>>>>>>>
>>>>>>>>>>> 3) ctrl->handle is designed to point to the fh that originated the
>>>>>>>>>>> control. So the logic can decide if the originator needs to be
>>>>>>>>>>> notified or not. (ref: uvc_ctrl_send_event() )
>>>>>>>>>>> 4) Right now we replace the originator in ctrl->handle for unfinished
>>>>>>>>>>> async controls.  (ref:
>>>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
>>>>>>>>>>>
>>>>>>>>>>> My interpretation is that:
>>>>>>>>>>> A) We need to change 4). We shall not change the originator of
>>>>>>>>>>> unfinished ctrl->handle.
>>>>>>>>>>> B) Well behaved cameras do not need the patch "Do not set an async
>>>>>>>>>>> control owned by another fh"
>>>>>>>>>>> C) For badly behaved cameras, it is fine if we slightly break the
>>>>>>>>>>> v4l2-compliance in corner cases, if we do not break any internal data
>>>>>>>>>>> structure.
>>>>>>>>>>
>>>>>>>>>> The fact that some devices may not implement the documented behaviour
>>>>>>>>>> correctly may not be a problem. Well-behaved devices will stall, which
>>>>>>>>>> means we shouldn't query the device while as async update is in
>>>>>>>>>> progress. Badly-behaved devices, whatever they do when queried, should
>>>>>>>>>> not cause any issue if we don't query them.
>>>>>>>>>
>>>>>>>>> I thought we could detect the stall and return safely. Isn't that the case?
>>>>>>>>
>>>>>>>> We could, but if we know the device will stall anyway, is there a reason
>>>>>>>> not to avoid issuing the request in the first place ?
>>>>>>>>
>>>>>>>>> Why we have not seen issues with this?
>>>>>>>>
>>>>>>>> I haven't tested a PTZ device for a very long time, and you would need
>>>>>>>> to hit a small time window to see the issue.
>>>>>>>>
>>>>>>>>>> We should not send GET_CUR and SET_CUR requests to the device while an
>>>>>>>>>> async update is in progress, and use cached values instead. When we
>>>>>>>>>> receive the async update event, we should clear the cache. This will be
>>>>>>>>>> the same for both well-behaved and badly-behaved devices, so we can
>>>>>>>>>> expose the same behaviour towards userspace.
>>>>>>>>>
>>>>>>>>> seting ctrl->loaded = 0 when we get an event sounds like a good idea
>>>>>>>>> and something we can implement right away.
>>>>>>>>> If I have to resend the set I will add it to the end.
>>>>>>>>>
>>>>>>>>>> We possibly also need some kind of timeout mechanism to cope with the
>>>>>>>>>> async update event not being delivered by the device.
>>>>>>>>>
>>>>>>>>> This is the part that worries me the most:
>>>>>>>>> - timeouts make the code fragile
>>>>>>>>> - What is a good value for timeout? 1 second, 30, 300? I do not think
>>>>>>>>> that we can find a value.
>>>>>>>>
>>>>>>>> I've been thinking about the implementation of uvc_fh cleanup over the
>>>>>>>> weekend, and having a timeout would have the nice advantage that we
>>>>>>>> could reference-count uvc_fh instead of implementing a cleanup that
>>>>>>>> walks over all controls when closing a file handle. I think it would
>>>>>>>> make the code simpler, and possibly safer too.
>>>>>>>>
>>>>>>>>>> Regarding the userspace behaviour during an auto-update, we have
>>>>>>>>>> multiple options:
>>>>>>>>>>
>>>>>>>>>> For control get,
>>>>>>>>>>
>>>>>>>>>> - We can return -EBUSY
>>>>>>>>>> - We can return the old value from the cache
>>>>>>>
>>>>>>> This would match the control behavior best. Only when the operation is
>>>>>>> done is the control updated and the control event sent.
>>>>>>>
>>>>>>> Some questions: is any of this documented for UVC? Because this is non-standard
>>>>>>
>>>>>> No this isn't documented.
>>>>>>
>>>>>>> behavior. Are there applications that rely on this? Should we perhaps add
>>>>>>
>>>>>> I don't know.
>>>>>>
>>>>>>> proper support for this to the control framework? E.g. add an ASYNC flag and
>>>>>>> document this?
>>>>>>
>>>>>> We could, but this is such a specific use case that I don't think is
>>>>>> worth adding complexity to the already complex control framework would
>>>>>> be worth it. What we could do is perhaps adding a flag for the userspace
>>>>>> API, but even there, I never like modelling an API with a single user.
>>>>>
>>>>> Well, it might be a single driver that uses this, but it is also the most
>>>>> used driver by far. I think the only change is to add a flag for this and
>>>>> describe how it should behave. And add v4l2-compliance tests for it.
>>>>>
>>>>> Otherwise no changes to the control framework are needed, I think.
>>>>>
>>>>> Controls with the ASYNC flag set would:
>>>>>
>>>>> - return the old value from the cache.
>>>>> - document that setting a new value while the operation is in progress
>>>>>   results in EBUSY. Document that if the new value is equal to the old value,
>>>>>   then return 0 and do nothing (alternative is to just immediately send
>>>>>   the control changed event, but that might require a control framework change).
>>>>> - when the operation finishes, update the cache to the new value and
>>>>>   send the control changed event.
>>>>> - document that userspace should specify V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK
>>>>>   when subscribing to the control if you calling fh wants to know when
>>>>>   the operation finishes.
>>>>> - document how timeouts should be handled: this is tricky, especially with
>>>>>   bad hardware. I.e. if the hw doesn't send the event, does that mean that
>>>>>   you are never able to set the control since it will stall?
>>>>>   In the end this will just reflect how UVC handles this.
>>>>
>>>> I have been catching up on this thread (I have not read the v3 and v4
>>>> threads yet).
>>>>
>>>> This all started with Ricardo noticing that ctrl->handle may get
>>>> overwritten when another app sets the ctrl, causing the first app
>>>> to set the ctrl to get a V4L2_EVENT for the ctrl (if subscribed)
>>>> even though it set the ctrl itself.
>>>>
>>>> My observations so far:
>>>>
>>>> 1. This is only hit when another app changes the ctrl after the first app,
>>>> in this case, if there is no stall issued by the hw for the second app's
>>>> request, arguably the first app getting the event for the ctrl is correct
>>>> since it was changed by the second app. IOW I think the current behavior
>>>> is not only fine, but even desirable. Assuming we only override ctrl->handle
>>>> after successfully sending the set-ctrl request to the hardware.
>>
>> I think you're right.
>>
>>>> 2. This adds a lot of complexity for not sending an event to the app
>>>> which made the change. Hans V. suggested maybe adding some sort of flag
>>>> for async ctrls to the userspace API. I wonder if we should not just
>>>> get rid of this complexity and document that these controls will always
>>>> generate events independent of V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK ?
>>>> That would certainly simplify things, but it raises the questions if
>>>> this will cause issues for existing applications.
>>>
>>> I'm not that keen on this. That's why a new flag can come in handy since
>>> if present, then that indicates that it makes sense to specify
>>> V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK when subscribing to the control events.
>>>
>>> This ensures that uvc follows the current v4l2 spec. It's also why I
>>> prefer that g_ctrl will just return the old value until the new value
>>> has been reached: that way the control event corresponds with the actual
>>> updating of the control value.
>>>
>>> That said, it's just my opinion and I am OK with UVC doing things a bit
>>> differently. Just be aware that I have no objection to adding an ASYNC flag,
>>> given how widely UVC is used.
>>
>> My experience with the V4L2 control API is that we've overdesigned quite
>> a few things, and in particular control events. The independent
>> "capture" and "control panel" application model that V4L2 controls were
>> designed for is not really a good fit for the 21st century anymore. The
>> V4L2 API isn't rich enough to arbitrate between applications that are
>> not designed to collaborate, and way too rich when applications do
>> collaborate. The only two real use cases I found for control events are
>> async set completion notification, and notification of automatic changes
>> to other controls (and in particular changes to control limits) when a
>> control is set. The second use case isn't even something that we support
>> well today: to make it really usable, the change notification should be
>> *synchronous* with the control set ioctl, returning the information from
>> the same ioctl, not through asynchronous control events.
> 
> The main reason you think it is complicated is because the uvc driver does
> not use the control framework, so it has to copy all the logic in the driver.
> That's very painful. Ideally, uvc should use the control framework, but that
> would require a complete overhaul of the uvc driver.
> 
> For all other drivers the complexity is zero since it is all in the framework.
> 
> Some of the Digital Video controls (HOTPLUG, EDID_PRESENT, RXSENSE,
> POWER_PRESENT) are meant to be used with control events to inform the applications
> when these things change. But you don't deal with HDMI video, so you never see
> them in use. The control event mechanism is generic, i.e. available for all
> controls. So the use in control panels is just one use-case and it is probably
> just qv4l2 that implements it. But control events are great for anything that
> happens asynchronously.
> 
> What is missing is support for asynchronous event like the zoom control that
> takes time to finish the operation. Ideally I would prefer that it would operate
> like the V4L2_CID_AUTO_FOCUS_* controls. But since the current mechanism is
> already in use in UVC I am fine with the current uvc approach. I just think
> this is something that should be signaled to userspace by a flag and that it
> is properly documented.

Sorry for this second post, I just wanted to say that in my opinion it is OK if
frameworks are complicated internally. That's the whole point of a framework: to
put the complexity in one place and hide it from the users of the framework.

If a framework was simple, then you probably wouldn't have needed a framework in
the first place. The problem with uvc is that you can't use the framework so all
the complexity now enters the driver :-(

Regards,

	Hans

> 
> Regarding the second use case: it's perfectly doable, but it would require a
> new ioctl. You would need really good arguments for doing that.
> 
> Regards,
> 
> 	Hans
> 
>>
>> TL;DR: If I can pick an option, let's make things simpler, not more
>> complex.
>>
>>>> Note that if we simply return -EBUSY on set until acked by a status
>>>> event we also avoid the issue of ctrl->handle getting overwritten,
>>>> but that relies on reliable status events; or requires timeout handling.
>>>>
>>>> 3. I agree with Ricardo that a timeout based approach for cameras which
>>>> to not properly send status events for async ctrls is going to be
>>>> problematic. Things like pan/tilt homing can take multiple seconds which
>>>> is really long to use as a timeout if we plan to return -EBUSY until
>>>> the timeout triggers. I think it would be better to just rely on
>>>> the hardware sending a stall, or it accepting and correctly handling
>>>> a new CUR_SET command while the previous one is still being processed.
>>>>
>>>> I guess we can track if the hw does send status events when async ctrls
>>>> complete and then do the -EBUSY thing without going out to the hw after
>>>> the first time an async ctrl has been acked by a status event.
>>>>
>>>> And then combine that with the current behavior of overwriting ctrl->handle
>>>> until the ctrl has been marked as having working status events. So:
>>>>
>>>> a) In case we do not know yet if a ctrl gets status-event acks; and
>>>> on devices without reliable status events keep current behavior.
>>>>
>>>> b) As soon as we know a ctrl has reliable status events, switch to
>>>> returning -EBUSY if a set is pending (as indicated by ctrl->handle
>>>> being set).
>>>>
>>>> I don't like the fact that this changes the behavior after the first
>>>> status event acking an async ctrl, but I don't really see another way.
>>>>
>>>>>>>>>> - We can return the new value fromt he cache
>>>>>>>>>>
>>>>>>>>>> Returning -EBUSY would be simpler to implement.
>>>>>>>>>
>>>>>>>>> Not only easy, I think it is the most correct,
>>>>>>>>>
>>>>>>>>>> I don't think the behaviour should depend on whether the control is read
>>>>>>>>>> on the file handle that initiated the async operation or on a different
>>>>>>>>>> file handle.
>>>>>>>>>>
>>>>>>>>>> For control set, I don't think we can do much else than returning
>>>>>>>>>> -EBUSY, regardless of which file handle the control is set on.
>>>>>>>>>
>>>>>>>>> ACK.
>>>>>>>>>
>>>>>>>>>>> I will send a new version with my interpretation.
>>>>>>>>>>>
>>>>>>>>>>> Thanks for a great discussion
>>>>>>>>>
>>>>>>>>> Looking with some perspective... I believe that we should look into
>>>>>>>>> the "userspace behaviour for auto controls" in a different patchset.
>>>>>>>>> It is slightly unrelated to this discussion.
>>
>
Laurent Pinchart Dec. 4, 2024, 9:02 a.m. UTC | #34
Hi Hans,

On Wed, Dec 04, 2024 at 08:59:04AM +0100, Hans Verkuil wrote:
> On 03/12/2024 18:18, Laurent Pinchart wrote:
> > On Mon, Dec 02, 2024 at 11:55:20AM +0100, Hans Verkuil wrote:
> >> On 02/12/2024 11:26, Hans de Goede wrote:
> >>> On 2-Dec-24 9:44 AM, Hans Verkuil wrote:
> >>>> On 02/12/2024 09:11, Laurent Pinchart wrote:
> >>>>> On Mon, Dec 02, 2024 at 09:05:07AM +0100, Hans Verkuil wrote:
> >>>>>> On 02/12/2024 01:18, Laurent Pinchart wrote:
> >>>>>>> On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
> >>>>>>>> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
> >>>>>>>>> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
> >>>>>>>>>> Before we all go on a well deserved weekend, let me recap what we
> >>>>>>>>>> know. If I did not get something correctly, let me know.
> >>>>>>>>>>
> >>>>>>>>>> 1) Well behaved devices do not allow to set or get an incomplete async
> >>>>>>>>>> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
> >>>>>>>>>> 2) Both Laurent and Ricardo consider that there is a big chance that
> >>>>>>>>>> some camera modules do not implement this properly. (ref: years of
> >>>>>>>>>> crying over broken module firmware :) )
> >>>>>>>>>>
> >>>>>>>>>> 3) ctrl->handle is designed to point to the fh that originated the
> >>>>>>>>>> control. So the logic can decide if the originator needs to be
> >>>>>>>>>> notified or not. (ref: uvc_ctrl_send_event() )
> >>>>>>>>>> 4) Right now we replace the originator in ctrl->handle for unfinished
> >>>>>>>>>> async controls.  (ref:
> >>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
> >>>>>>>>>>
> >>>>>>>>>> My interpretation is that:
> >>>>>>>>>> A) We need to change 4). We shall not change the originator of
> >>>>>>>>>> unfinished ctrl->handle.
> >>>>>>>>>> B) Well behaved cameras do not need the patch "Do not set an async
> >>>>>>>>>> control owned by another fh"
> >>>>>>>>>> C) For badly behaved cameras, it is fine if we slightly break the
> >>>>>>>>>> v4l2-compliance in corner cases, if we do not break any internal data
> >>>>>>>>>> structure.
> >>>>>>>>>
> >>>>>>>>> The fact that some devices may not implement the documented behaviour
> >>>>>>>>> correctly may not be a problem. Well-behaved devices will stall, which
> >>>>>>>>> means we shouldn't query the device while as async update is in
> >>>>>>>>> progress. Badly-behaved devices, whatever they do when queried, should
> >>>>>>>>> not cause any issue if we don't query them.
> >>>>>>>>
> >>>>>>>> I thought we could detect the stall and return safely. Isn't that the case?
> >>>>>>>
> >>>>>>> We could, but if we know the device will stall anyway, is there a reason
> >>>>>>> not to avoid issuing the request in the first place ?
> >>>>>>>
> >>>>>>>> Why we have not seen issues with this?
> >>>>>>>
> >>>>>>> I haven't tested a PTZ device for a very long time, and you would need
> >>>>>>> to hit a small time window to see the issue.
> >>>>>>>
> >>>>>>>>> We should not send GET_CUR and SET_CUR requests to the device while an
> >>>>>>>>> async update is in progress, and use cached values instead. When we
> >>>>>>>>> receive the async update event, we should clear the cache. This will be
> >>>>>>>>> the same for both well-behaved and badly-behaved devices, so we can
> >>>>>>>>> expose the same behaviour towards userspace.
> >>>>>>>>
> >>>>>>>> seting ctrl->loaded = 0 when we get an event sounds like a good idea
> >>>>>>>> and something we can implement right away.
> >>>>>>>> If I have to resend the set I will add it to the end.
> >>>>>>>>
> >>>>>>>>> We possibly also need some kind of timeout mechanism to cope with the
> >>>>>>>>> async update event not being delivered by the device.
> >>>>>>>>
> >>>>>>>> This is the part that worries me the most:
> >>>>>>>> - timeouts make the code fragile
> >>>>>>>> - What is a good value for timeout? 1 second, 30, 300? I do not think
> >>>>>>>> that we can find a value.
> >>>>>>>
> >>>>>>> I've been thinking about the implementation of uvc_fh cleanup over the
> >>>>>>> weekend, and having a timeout would have the nice advantage that we
> >>>>>>> could reference-count uvc_fh instead of implementing a cleanup that
> >>>>>>> walks over all controls when closing a file handle. I think it would
> >>>>>>> make the code simpler, and possibly safer too.
> >>>>>>>
> >>>>>>>>> Regarding the userspace behaviour during an auto-update, we have
> >>>>>>>>> multiple options:
> >>>>>>>>>
> >>>>>>>>> For control get,
> >>>>>>>>>
> >>>>>>>>> - We can return -EBUSY
> >>>>>>>>> - We can return the old value from the cache
> >>>>>>
> >>>>>> This would match the control behavior best. Only when the operation is
> >>>>>> done is the control updated and the control event sent.
> >>>>>>
> >>>>>> Some questions: is any of this documented for UVC? Because this is non-standard
> >>>>>
> >>>>> No this isn't documented.
> >>>>>
> >>>>>> behavior. Are there applications that rely on this? Should we perhaps add
> >>>>>
> >>>>> I don't know.
> >>>>>
> >>>>>> proper support for this to the control framework? E.g. add an ASYNC flag and
> >>>>>> document this?
> >>>>>
> >>>>> We could, but this is such a specific use case that I don't think is
> >>>>> worth adding complexity to the already complex control framework would
> >>>>> be worth it. What we could do is perhaps adding a flag for the userspace
> >>>>> API, but even there, I never like modelling an API with a single user.
> >>>>
> >>>> Well, it might be a single driver that uses this, but it is also the most
> >>>> used driver by far. I think the only change is to add a flag for this and
> >>>> describe how it should behave. And add v4l2-compliance tests for it.
> >>>>
> >>>> Otherwise no changes to the control framework are needed, I think.
> >>>>
> >>>> Controls with the ASYNC flag set would:
> >>>>
> >>>> - return the old value from the cache.
> >>>> - document that setting a new value while the operation is in progress
> >>>>   results in EBUSY. Document that if the new value is equal to the old value,
> >>>>   then return 0 and do nothing (alternative is to just immediately send
> >>>>   the control changed event, but that might require a control framework change).
> >>>> - when the operation finishes, update the cache to the new value and
> >>>>   send the control changed event.
> >>>> - document that userspace should specify V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK
> >>>>   when subscribing to the control if you calling fh wants to know when
> >>>>   the operation finishes.
> >>>> - document how timeouts should be handled: this is tricky, especially with
> >>>>   bad hardware. I.e. if the hw doesn't send the event, does that mean that
> >>>>   you are never able to set the control since it will stall?
> >>>>   In the end this will just reflect how UVC handles this.
> >>>
> >>> I have been catching up on this thread (I have not read the v3 and v4
> >>> threads yet).
> >>>
> >>> This all started with Ricardo noticing that ctrl->handle may get
> >>> overwritten when another app sets the ctrl, causing the first app
> >>> to set the ctrl to get a V4L2_EVENT for the ctrl (if subscribed)
> >>> even though it set the ctrl itself.
> >>>
> >>> My observations so far:
> >>>
> >>> 1. This is only hit when another app changes the ctrl after the first app,
> >>> in this case, if there is no stall issued by the hw for the second app's
> >>> request, arguably the first app getting the event for the ctrl is correct
> >>> since it was changed by the second app. IOW I think the current behavior
> >>> is not only fine, but even desirable. Assuming we only override ctrl->handle
> >>> after successfully sending the set-ctrl request to the hardware.
> > 
> > I think you're right.
> > 
> >>> 2. This adds a lot of complexity for not sending an event to the app
> >>> which made the change. Hans V. suggested maybe adding some sort of flag
> >>> for async ctrls to the userspace API. I wonder if we should not just
> >>> get rid of this complexity and document that these controls will always
> >>> generate events independent of V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK ?
> >>> That would certainly simplify things, but it raises the questions if
> >>> this will cause issues for existing applications.
> >>
> >> I'm not that keen on this. That's why a new flag can come in handy since
> >> if present, then that indicates that it makes sense to specify
> >> V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK when subscribing to the control events.
> >>
> >> This ensures that uvc follows the current v4l2 spec. It's also why I
> >> prefer that g_ctrl will just return the old value until the new value
> >> has been reached: that way the control event corresponds with the actual
> >> updating of the control value.
> >>
> >> That said, it's just my opinion and I am OK with UVC doing things a bit
> >> differently. Just be aware that I have no objection to adding an ASYNC flag,
> >> given how widely UVC is used.
> > 
> > My experience with the V4L2 control API is that we've overdesigned quite
> > a few things, and in particular control events. The independent
> > "capture" and "control panel" application model that V4L2 controls were
> > designed for is not really a good fit for the 21st century anymore. The
> > V4L2 API isn't rich enough to arbitrate between applications that are
> > not designed to collaborate, and way too rich when applications do
> > collaborate. The only two real use cases I found for control events are
> > async set completion notification, and notification of automatic changes
> > to other controls (and in particular changes to control limits) when a
> > control is set. The second use case isn't even something that we support
> > well today: to make it really usable, the change notification should be
> > *synchronous* with the control set ioctl, returning the information from
> > the same ioctl, not through asynchronous control events.
> 
> The main reason you think it is complicated is because the uvc driver does
> not use the control framework, so it has to copy all the logic in the driver.
> That's very painful. Ideally, uvc should use the control framework, but that
> would require a complete overhaul of the uvc driver.
> 
> For all other drivers the complexity is zero since it is all in the framework.

I should have been clearer, what actually concerns me the most is the
complexity of the control framework, more than the complexity of the
uvcvideo driver. The framework has grown to a point where I think you're
the only person who can understand it, which is concerning from a
maintanability point of view as changes then can't be meaningfully
reviewed. Don't get me wrong, I'm not saying you're not doing a great
job there, but it's still a concern.

> Some of the Digital Video controls (HOTPLUG, EDID_PRESENT, RXSENSE,
> POWER_PRESENT) are meant to be used with control events to inform the applications
> when these things change. But you don't deal with HDMI video, so you never see
> them in use. The control event mechanism is generic, i.e. available for all
> controls. So the use in control panels is just one use-case and it is probably
> just qv4l2 that implements it. But control events are great for anything that
> happens asynchronously.

I should have mentioned those as well. We certainly need events for any
asynchronous, well, event :-)

> What is missing is support for asynchronous event like the zoom control that
> takes time to finish the operation. Ideally I would prefer that it would operate
> like the V4L2_CID_AUTO_FOCUS_* controls. But since the current mechanism is
> already in use in UVC I am fine with the current uvc approach. I just think
> this is something that should be signaled to userspace by a flag and that it
> is properly documented.
> 
> Regarding the second use case: it's perfectly doable, but it would require a
> new ioctl. You would need really good arguments for doing that.

I'm not sure I'd consider it doable. To make this trully useful we would
need to also report the value of changed controls (including limits)
from a S_FMT or S_SELECTION call. Yes, it could be achieved, but the
cost of doing so is probably not worth it.

> > TL;DR: If I can pick an option, let's make things simpler, not more
> > complex.
> > 
> >>> Note that if we simply return -EBUSY on set until acked by a status
> >>> event we also avoid the issue of ctrl->handle getting overwritten,
> >>> but that relies on reliable status events; or requires timeout handling.
> >>>
> >>> 3. I agree with Ricardo that a timeout based approach for cameras which
> >>> to not properly send status events for async ctrls is going to be
> >>> problematic. Things like pan/tilt homing can take multiple seconds which
> >>> is really long to use as a timeout if we plan to return -EBUSY until
> >>> the timeout triggers. I think it would be better to just rely on
> >>> the hardware sending a stall, or it accepting and correctly handling
> >>> a new CUR_SET command while the previous one is still being processed.
> >>>
> >>> I guess we can track if the hw does send status events when async ctrls
> >>> complete and then do the -EBUSY thing without going out to the hw after
> >>> the first time an async ctrl has been acked by a status event.
> >>>
> >>> And then combine that with the current behavior of overwriting ctrl->handle
> >>> until the ctrl has been marked as having working status events. So:
> >>>
> >>> a) In case we do not know yet if a ctrl gets status-event acks; and
> >>> on devices without reliable status events keep current behavior.
> >>>
> >>> b) As soon as we know a ctrl has reliable status events, switch to
> >>> returning -EBUSY if a set is pending (as indicated by ctrl->handle
> >>> being set).
> >>>
> >>> I don't like the fact that this changes the behavior after the first
> >>> status event acking an async ctrl, but I don't really see another way.
> >>>
> >>>>>>>>> - We can return the new value fromt he cache
> >>>>>>>>>
> >>>>>>>>> Returning -EBUSY would be simpler to implement.
> >>>>>>>>
> >>>>>>>> Not only easy, I think it is the most correct,
> >>>>>>>>
> >>>>>>>>> I don't think the behaviour should depend on whether the control is read
> >>>>>>>>> on the file handle that initiated the async operation or on a different
> >>>>>>>>> file handle.
> >>>>>>>>>
> >>>>>>>>> For control set, I don't think we can do much else than returning
> >>>>>>>>> -EBUSY, regardless of which file handle the control is set on.
> >>>>>>>>
> >>>>>>>> ACK.
> >>>>>>>>
> >>>>>>>>>> I will send a new version with my interpretation.
> >>>>>>>>>>
> >>>>>>>>>> Thanks for a great discussion
> >>>>>>>>
> >>>>>>>> Looking with some perspective... I believe that we should look into
> >>>>>>>> the "userspace behaviour for auto controls" in a different patchset.
> >>>>>>>> It is slightly unrelated to this discussion.
Laurent Pinchart Dec. 4, 2024, 9:10 a.m. UTC | #35
Hi Hans,

Repurposing the mail thread, trimming the CC list, and adding a few
people who I expect to be interested in the new topic.

On Wed, Dec 04, 2024 at 11:02:07AM +0200, Laurent Pinchart wrote:
> On Wed, Dec 04, 2024 at 08:59:04AM +0100, Hans Verkuil wrote:
> > On 03/12/2024 18:18, Laurent Pinchart wrote:

[snip]

> > > My experience with the V4L2 control API is that we've overdesigned quite
> > > a few things, and in particular control events. The independent
> > > "capture" and "control panel" application model that V4L2 controls were
> > > designed for is not really a good fit for the 21st century anymore. The
> > > V4L2 API isn't rich enough to arbitrate between applications that are
> > > not designed to collaborate, and way too rich when applications do
> > > collaborate. The only two real use cases I found for control events are
> > > async set completion notification, and notification of automatic changes
> > > to other controls (and in particular changes to control limits) when a
> > > control is set. The second use case isn't even something that we support
> > > well today: to make it really usable, the change notification should be
> > > *synchronous* with the control set ioctl, returning the information from
> > > the same ioctl, not through asynchronous control events.

[snip]

> > Regarding the second use case: it's perfectly doable, but it would require a
> > new ioctl. You would need really good arguments for doing that.
> 
> I'm not sure I'd consider it doable. To make this trully useful we would
> need to also report the value of changed controls (including limits)
> from a S_FMT or S_SELECTION call. Yes, it could be achieved, but the
> cost of doing so is probably not worth it.

There's another change that would hopefully be simpler (but not simple),
and would bring important improvements. Controls and subdev
format/selection rectangles use a very different try model: the former
use a TRY ioctl, while the latter use a TRY context. The TRY context
makes it possible to try a full configuration of a subdev without
changing its state, where the TRY ioctl can only support trying a
control in isolation of all other parameters. This is blocking quite a
few subdev features for which we would like to use controls, and is also
problematic for existing controls (such as h/v flip or rotation, as they
affect the format). Extending the control get/set ioctls with a "which"
parameter shouldn't be difficult, but wiring that to the control
framework is the part that scares me.

To make this possible, we will need to store information about controls
in the v4l2_subdev_state. I would like to minimize the amount of
information stored there, and the complexity required to instantiate a
new state, as that should be as lightweight as possible. To that end, we
can limit the mechanism to, for instance, forbid adding new controls to
a control handler after its initialization phase. Ideally, we shouldn't
duplicate in the subdev state any information that can't change, such as
the type of a control. Only the current value and the limits should be
stored there.

We will also need to be careful about locking. This mechanism should
protect access to control information using a state-level lock and not
locks embedded deep in the control framework, as we would then suffer
from AB-BA deadlock issues.

This new mechanism would be primarily used for subdevs in a first step,
but it should be implemented in such a way to also be usable for video
devices.

Now, the real question: how can we make this happen ? :-)
Laurent Pinchart Dec. 4, 2024, 9:13 a.m. UTC | #36
On Wed, Dec 04, 2024 at 09:04:11AM +0100, Hans Verkuil wrote:
> On 04/12/2024 08:59, Hans Verkuil wrote:
> > On 03/12/2024 18:18, Laurent Pinchart wrote:
> >> On Mon, Dec 02, 2024 at 11:55:20AM +0100, Hans Verkuil wrote:
> >>> On 02/12/2024 11:26, Hans de Goede wrote:
> >>>> On 2-Dec-24 9:44 AM, Hans Verkuil wrote:
> >>>>> On 02/12/2024 09:11, Laurent Pinchart wrote:
> >>>>>> On Mon, Dec 02, 2024 at 09:05:07AM +0100, Hans Verkuil wrote:
> >>>>>>> On 02/12/2024 01:18, Laurent Pinchart wrote:
> >>>>>>>> On Fri, Nov 29, 2024 at 11:18:54PM +0100, Ricardo Ribalda wrote:
> >>>>>>>>> On Fri, 29 Nov 2024 at 23:03, Laurent Pinchart wrote:
> >>>>>>>>>> On Fri, Nov 29, 2024 at 07:47:31PM +0100, Ricardo Ribalda wrote:
> >>>>>>>>>>> Before we all go on a well deserved weekend, let me recap what we
> >>>>>>>>>>> know. If I did not get something correctly, let me know.
> >>>>>>>>>>>
> >>>>>>>>>>> 1) Well behaved devices do not allow to set or get an incomplete async
> >>>>>>>>>>> control. They will stall instead (ref: Figure 2-21 in UVC 1.5 )
> >>>>>>>>>>> 2) Both Laurent and Ricardo consider that there is a big chance that
> >>>>>>>>>>> some camera modules do not implement this properly. (ref: years of
> >>>>>>>>>>> crying over broken module firmware :) )
> >>>>>>>>>>>
> >>>>>>>>>>> 3) ctrl->handle is designed to point to the fh that originated the
> >>>>>>>>>>> control. So the logic can decide if the originator needs to be
> >>>>>>>>>>> notified or not. (ref: uvc_ctrl_send_event() )
> >>>>>>>>>>> 4) Right now we replace the originator in ctrl->handle for unfinished
> >>>>>>>>>>> async controls.  (ref:
> >>>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/usb/uvc/uvc_ctrl.c#n2050)
> >>>>>>>>>>>
> >>>>>>>>>>> My interpretation is that:
> >>>>>>>>>>> A) We need to change 4). We shall not change the originator of
> >>>>>>>>>>> unfinished ctrl->handle.
> >>>>>>>>>>> B) Well behaved cameras do not need the patch "Do not set an async
> >>>>>>>>>>> control owned by another fh"
> >>>>>>>>>>> C) For badly behaved cameras, it is fine if we slightly break the
> >>>>>>>>>>> v4l2-compliance in corner cases, if we do not break any internal data
> >>>>>>>>>>> structure.
> >>>>>>>>>>
> >>>>>>>>>> The fact that some devices may not implement the documented behaviour
> >>>>>>>>>> correctly may not be a problem. Well-behaved devices will stall, which
> >>>>>>>>>> means we shouldn't query the device while as async update is in
> >>>>>>>>>> progress. Badly-behaved devices, whatever they do when queried, should
> >>>>>>>>>> not cause any issue if we don't query them.
> >>>>>>>>>
> >>>>>>>>> I thought we could detect the stall and return safely. Isn't that the case?
> >>>>>>>>
> >>>>>>>> We could, but if we know the device will stall anyway, is there a reason
> >>>>>>>> not to avoid issuing the request in the first place ?
> >>>>>>>>
> >>>>>>>>> Why we have not seen issues with this?
> >>>>>>>>
> >>>>>>>> I haven't tested a PTZ device for a very long time, and you would need
> >>>>>>>> to hit a small time window to see the issue.
> >>>>>>>>
> >>>>>>>>>> We should not send GET_CUR and SET_CUR requests to the device while an
> >>>>>>>>>> async update is in progress, and use cached values instead. When we
> >>>>>>>>>> receive the async update event, we should clear the cache. This will be
> >>>>>>>>>> the same for both well-behaved and badly-behaved devices, so we can
> >>>>>>>>>> expose the same behaviour towards userspace.
> >>>>>>>>>
> >>>>>>>>> seting ctrl->loaded = 0 when we get an event sounds like a good idea
> >>>>>>>>> and something we can implement right away.
> >>>>>>>>> If I have to resend the set I will add it to the end.
> >>>>>>>>>
> >>>>>>>>>> We possibly also need some kind of timeout mechanism to cope with the
> >>>>>>>>>> async update event not being delivered by the device.
> >>>>>>>>>
> >>>>>>>>> This is the part that worries me the most:
> >>>>>>>>> - timeouts make the code fragile
> >>>>>>>>> - What is a good value for timeout? 1 second, 30, 300? I do not think
> >>>>>>>>> that we can find a value.
> >>>>>>>>
> >>>>>>>> I've been thinking about the implementation of uvc_fh cleanup over the
> >>>>>>>> weekend, and having a timeout would have the nice advantage that we
> >>>>>>>> could reference-count uvc_fh instead of implementing a cleanup that
> >>>>>>>> walks over all controls when closing a file handle. I think it would
> >>>>>>>> make the code simpler, and possibly safer too.
> >>>>>>>>
> >>>>>>>>>> Regarding the userspace behaviour during an auto-update, we have
> >>>>>>>>>> multiple options:
> >>>>>>>>>>
> >>>>>>>>>> For control get,
> >>>>>>>>>>
> >>>>>>>>>> - We can return -EBUSY
> >>>>>>>>>> - We can return the old value from the cache
> >>>>>>>
> >>>>>>> This would match the control behavior best. Only when the operation is
> >>>>>>> done is the control updated and the control event sent.
> >>>>>>>
> >>>>>>> Some questions: is any of this documented for UVC? Because this is non-standard
> >>>>>>
> >>>>>> No this isn't documented.
> >>>>>>
> >>>>>>> behavior. Are there applications that rely on this? Should we perhaps add
> >>>>>>
> >>>>>> I don't know.
> >>>>>>
> >>>>>>> proper support for this to the control framework? E.g. add an ASYNC flag and
> >>>>>>> document this?
> >>>>>>
> >>>>>> We could, but this is such a specific use case that I don't think is
> >>>>>> worth adding complexity to the already complex control framework would
> >>>>>> be worth it. What we could do is perhaps adding a flag for the userspace
> >>>>>> API, but even there, I never like modelling an API with a single user.
> >>>>>
> >>>>> Well, it might be a single driver that uses this, but it is also the most
> >>>>> used driver by far. I think the only change is to add a flag for this and
> >>>>> describe how it should behave. And add v4l2-compliance tests for it.
> >>>>>
> >>>>> Otherwise no changes to the control framework are needed, I think.
> >>>>>
> >>>>> Controls with the ASYNC flag set would:
> >>>>>
> >>>>> - return the old value from the cache.
> >>>>> - document that setting a new value while the operation is in progress
> >>>>>   results in EBUSY. Document that if the new value is equal to the old value,
> >>>>>   then return 0 and do nothing (alternative is to just immediately send
> >>>>>   the control changed event, but that might require a control framework change).
> >>>>> - when the operation finishes, update the cache to the new value and
> >>>>>   send the control changed event.
> >>>>> - document that userspace should specify V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK
> >>>>>   when subscribing to the control if you calling fh wants to know when
> >>>>>   the operation finishes.
> >>>>> - document how timeouts should be handled: this is tricky, especially with
> >>>>>   bad hardware. I.e. if the hw doesn't send the event, does that mean that
> >>>>>   you are never able to set the control since it will stall?
> >>>>>   In the end this will just reflect how UVC handles this.
> >>>>
> >>>> I have been catching up on this thread (I have not read the v3 and v4
> >>>> threads yet).
> >>>>
> >>>> This all started with Ricardo noticing that ctrl->handle may get
> >>>> overwritten when another app sets the ctrl, causing the first app
> >>>> to set the ctrl to get a V4L2_EVENT for the ctrl (if subscribed)
> >>>> even though it set the ctrl itself.
> >>>>
> >>>> My observations so far:
> >>>>
> >>>> 1. This is only hit when another app changes the ctrl after the first app,
> >>>> in this case, if there is no stall issued by the hw for the second app's
> >>>> request, arguably the first app getting the event for the ctrl is correct
> >>>> since it was changed by the second app. IOW I think the current behavior
> >>>> is not only fine, but even desirable. Assuming we only override ctrl->handle
> >>>> after successfully sending the set-ctrl request to the hardware.
> >>
> >> I think you're right.
> >>
> >>>> 2. This adds a lot of complexity for not sending an event to the app
> >>>> which made the change. Hans V. suggested maybe adding some sort of flag
> >>>> for async ctrls to the userspace API. I wonder if we should not just
> >>>> get rid of this complexity and document that these controls will always
> >>>> generate events independent of V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK ?
> >>>> That would certainly simplify things, but it raises the questions if
> >>>> this will cause issues for existing applications.
> >>>
> >>> I'm not that keen on this. That's why a new flag can come in handy since
> >>> if present, then that indicates that it makes sense to specify
> >>> V4L2_EVENT_SUB_FL_ALLOW_FEEDBACK when subscribing to the control events.
> >>>
> >>> This ensures that uvc follows the current v4l2 spec. It's also why I
> >>> prefer that g_ctrl will just return the old value until the new value
> >>> has been reached: that way the control event corresponds with the actual
> >>> updating of the control value.
> >>>
> >>> That said, it's just my opinion and I am OK with UVC doing things a bit
> >>> differently. Just be aware that I have no objection to adding an ASYNC flag,
> >>> given how widely UVC is used.
> >>
> >> My experience with the V4L2 control API is that we've overdesigned quite
> >> a few things, and in particular control events. The independent
> >> "capture" and "control panel" application model that V4L2 controls were
> >> designed for is not really a good fit for the 21st century anymore. The
> >> V4L2 API isn't rich enough to arbitrate between applications that are
> >> not designed to collaborate, and way too rich when applications do
> >> collaborate. The only two real use cases I found for control events are
> >> async set completion notification, and notification of automatic changes
> >> to other controls (and in particular changes to control limits) when a
> >> control is set. The second use case isn't even something that we support
> >> well today: to make it really usable, the change notification should be
> >> *synchronous* with the control set ioctl, returning the information from
> >> the same ioctl, not through asynchronous control events.
> > 
> > The main reason you think it is complicated is because the uvc driver does
> > not use the control framework, so it has to copy all the logic in the driver.
> > That's very painful. Ideally, uvc should use the control framework, but that
> > would require a complete overhaul of the uvc driver.
> > 
> > For all other drivers the complexity is zero since it is all in the framework.
> > 
> > Some of the Digital Video controls (HOTPLUG, EDID_PRESENT, RXSENSE,
> > POWER_PRESENT) are meant to be used with control events to inform the applications
> > when these things change. But you don't deal with HDMI video, so you never see
> > them in use. The control event mechanism is generic, i.e. available for all
> > controls. So the use in control panels is just one use-case and it is probably
> > just qv4l2 that implements it. But control events are great for anything that
> > happens asynchronously.
> > 
> > What is missing is support for asynchronous event like the zoom control that
> > takes time to finish the operation. Ideally I would prefer that it would operate
> > like the V4L2_CID_AUTO_FOCUS_* controls. But since the current mechanism is
> > already in use in UVC I am fine with the current uvc approach. I just think
> > this is something that should be signaled to userspace by a flag and that it
> > is properly documented.
> 
> Sorry for this second post, I just wanted to say that in my opinion it is OK if
> frameworks are complicated internally. That's the whole point of a framework: to
> put the complexity in one place and hide it from the users of the framework.
> 
> If a framework was simple, then you probably wouldn't have needed a framework in
> the first place. The problem with uvc is that you can't use the framework so all
> the complexity now enters the driver :-(

As I mentioned in a separate e-mail, I'm less concerned about the code
in the uvcvideo driver than in the control framework. I'm quite sure you
have the opposite opinion, and that's not too surprising. I'm quite
familiar with the uvcvideo internals, while you know the control
framework well :-)

We have lots of complexity in the control framework that comes from
complexity in the controls API. I'm less criticizing the implementation
of the control framework than the design of the API. History partly
explains some of the design decisions, and if we had to redo this today,
I'm sure the result would be very different. V4L3 will have a nicer
control API.

> > Regarding the second use case: it's perfectly doable, but it would require a
> > new ioctl. You would need really good arguments for doing that.
> > 
> >> TL;DR: If I can pick an option, let's make things simpler, not more
> >> complex.
> >>
> >>>> Note that if we simply return -EBUSY on set until acked by a status
> >>>> event we also avoid the issue of ctrl->handle getting overwritten,
> >>>> but that relies on reliable status events; or requires timeout handling.
> >>>>
> >>>> 3. I agree with Ricardo that a timeout based approach for cameras which
> >>>> to not properly send status events for async ctrls is going to be
> >>>> problematic. Things like pan/tilt homing can take multiple seconds which
> >>>> is really long to use as a timeout if we plan to return -EBUSY until
> >>>> the timeout triggers. I think it would be better to just rely on
> >>>> the hardware sending a stall, or it accepting and correctly handling
> >>>> a new CUR_SET command while the previous one is still being processed.
> >>>>
> >>>> I guess we can track if the hw does send status events when async ctrls
> >>>> complete and then do the -EBUSY thing without going out to the hw after
> >>>> the first time an async ctrl has been acked by a status event.
> >>>>
> >>>> And then combine that with the current behavior of overwriting ctrl->handle
> >>>> until the ctrl has been marked as having working status events. So:
> >>>>
> >>>> a) In case we do not know yet if a ctrl gets status-event acks; and
> >>>> on devices without reliable status events keep current behavior.
> >>>>
> >>>> b) As soon as we know a ctrl has reliable status events, switch to
> >>>> returning -EBUSY if a set is pending (as indicated by ctrl->handle
> >>>> being set).
> >>>>
> >>>> I don't like the fact that this changes the behavior after the first
> >>>> status event acking an async ctrl, but I don't really see another way.
> >>>>
> >>>>>>>>>> - We can return the new value fromt he cache
> >>>>>>>>>>
> >>>>>>>>>> Returning -EBUSY would be simpler to implement.
> >>>>>>>>>
> >>>>>>>>> Not only easy, I think it is the most correct,
> >>>>>>>>>
> >>>>>>>>>> I don't think the behaviour should depend on whether the control is read
> >>>>>>>>>> on the file handle that initiated the async operation or on a different
> >>>>>>>>>> file handle.
> >>>>>>>>>>
> >>>>>>>>>> For control set, I don't think we can do much else than returning
> >>>>>>>>>> -EBUSY, regardless of which file handle the control is set on.
> >>>>>>>>>
> >>>>>>>>> ACK.
> >>>>>>>>>
> >>>>>>>>>>> I will send a new version with my interpretation.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks for a great discussion
> >>>>>>>>>
> >>>>>>>>> Looking with some perspective... I believe that we should look into
> >>>>>>>>> the "userspace behaviour for auto controls" in a different patchset.
> >>>>>>>>> It is slightly unrelated to this discussion.
Hans de Goede Dec. 4, 2024, 1:46 p.m. UTC | #37
Hi,

On 3-Dec-24 8:32 PM, Laurent Pinchart wrote:
> On Mon, Dec 02, 2024 at 02:29:24PM +0100, Ricardo Ribalda wrote:
>> On Mon, 2 Dec 2024 at 13:19, Hans de Goede wrote:
>>> On 2-Dec-24 11:50 AM, Ricardo Ribalda wrote:
>>>> On Mon, 2 Dec 2024 at 11:27, Hans de Goede wrote:

<snip>

>>>>> Note that if we simply return -EBUSY on set until acked by a status
>>>>> event we also avoid the issue of ctrl->handle getting overwritten,
>>>>> but that relies on reliable status events; or requires timeout handling.
>>>>>
>>>>> 3. I agree with Ricardo that a timeout based approach for cameras which
>>>>> to not properly send status events for async ctrls is going to be
>>>>> problematic. Things like pan/tilt homing can take multiple seconds which
>>>>> is really long to use as a timeout if we plan to return -EBUSY until
>>>>> the timeout triggers. I think it would be better to just rely on
>>>>> the hardware sending a stall, or it accepting and correctly handling
>>>>> a new CUR_SET command while the previous one is still being processed.
>>>>>
>>>>> I guess we can track if the hw does send status events when async ctrls
>>>>> complete and then do the -EBUSY thing without going out to the hw after
>>>>> the first time an async ctrl has been acked by a status event.
> 
> That sounds quite complex, and wouldn't guard against the status event
> being occasionally lost. I'm more concerned about devices that only
> occasionally mess up sending the status event, not devices that never
> send it.

I did wonder if we would see devices where the status event is
occasionally lost.

I think that patches 1-4 of "[PATCH v6 0/5] media: uvcvideo: Two +1 fixes for
async controls" solves the issue at hand nicely, so lets go with that.

And I hereby withdraw my proposal for a per ctrl flag to track if
we get status events for that control, because as you say that will
not work in the case where the status events are missing some of
the time (rather then the status events simply being never send).

Regards,

Hans
diff mbox series

Patch

diff --git a/drivers/media/usb/uvc/uvc_ctrl.c b/drivers/media/usb/uvc/uvc_ctrl.c
index b6af4ff92cbd..3f8ae35cb3bc 100644
--- a/drivers/media/usb/uvc/uvc_ctrl.c
+++ b/drivers/media/usb/uvc/uvc_ctrl.c
@@ -1955,6 +1955,10 @@  int uvc_ctrl_set(struct uvc_fh *handle,
 	if (!(ctrl->info.flags & UVC_CTRL_FLAG_SET_CUR))
 		return -EACCES;
 
+	/* Other file handle is waiting a response from this async control. */
+	if (ctrl->handle && ctrl->handle != handle)
+		return -EBUSY;
+
 	/* Clamp out of range values. */
 	switch (mapping->v4l2_type) {
 	case V4L2_CTRL_TYPE_INTEGER: