diff mbox series

USB: cdc-wdm: Remove unsafe wdm_flush().

Message ID 1597188375-4787-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp (mailing list archive)
State New, archived
Headers show
Series USB: cdc-wdm: Remove unsafe wdm_flush(). | expand

Commit Message

Tetsuo Handa Aug. 11, 2020, 11:26 p.m. UTC
syzbot is reporting hung task at wdm_flush() [1], for there is a circular
dependency that wdm_flush() from flip_close() for /dev/cdc-wdm0 forever
waits for /dev/raw-gadget to be closed while close() for /dev/raw-gadget
cannot be called unless close() for /dev/cdc-wdm0 completes.

It turned out that wdm_flush() is broken in multiple aspects.

(1) Since multiple threads can concurrently call wait_event() from
    wdm_flush() using multiple file descriptors which refer the same
    "desc", we need to use wake_up_all() when clearing WDM_IN_USE
    in order to make sure that all waiters are woken up.

But such change did not help because it turned out that the root cause is
the circular dependency described above.

I considered that such circular dependency is an usage error [2] which
corresponds to an unresponding broken hardware [3]. But Alan Stern
responded that we should be prepared for such hardware [4]. Therefore,
we tried to handle an unresponding broken hardware.

(2) Since multiple threads can share same "desc" using multiple file
    descriptors, there is no guarantee that failure notice received via
    wdm_flush() via some file descriptor corresponds to usb_submit_urb()
    request from wdm_write() call from that file descriptor.

(3) Since wdm_flush() is called without locks held, it is not safe to
    dereference desc->intf after checking that WDM_DISCONNECTING is not
    set [5].

We could change wdm_write() to synchronously wait for response [6] in
order to make sure that a thread which issued usb_submit_urb() request
will receive failure notice for that request. But Oliver Neukum responded
that such change impacts performance and timers. I tried to know
how heavily wdm requests/responses are processed, but I could not receive
response from developers [7].

One year has elapsed since this bug was reported, and this bug is the
third top crasher for syzbot. Waiting for another year without progress
is a waste of resource.

Therefore, for now, simply remove wdm_flush() completely at the cost of
giving up ability to receive failure notice for somebody's
usb_submit_urb() request from somebody's wdm_write() call.

Since I can't touch wdm_write() without knowing how heavily wdm
requests/responses are processed, this patch does not handle unresponding
broken hardware in wdm_write() which forever waits for an response from
such hardware issued by previous usb_submit_urb() request from previous
wdm_write() call [5].

[1] https://syzkaller.appspot.com/bug?id=e7b761593b23eb50855b9ea31e3be5472b711186
[2] https://lkml.kernel.org/r/27b7545e-8f41-10b8-7c02-e35a08eb1611@i-love.sakura.ne.jp
[3] https://lkml.kernel.org/r/79ba410f-e0ef-2465-b94f-6b9a4a82adf5@i-love.sakura.ne.jp
[4] https://lkml.kernel.org/r/20200530011040.GB12419@rowland.harvard.edu
[5] https://lkml.kernel.org/r/c85331fc-874c-6e46-a77f-0ef1dc075308@i-love.sakura.ne.jp
[6] https://lkml.kernel.org/r/b347b882-a986-24c6-2b37-0b1a092931b9@i-love.sakura.ne.jp
[7] https://lkml.kernel.org/r/f6de3d3a-6825-1904-65f4-8d96594a9846@i-love.sakura.ne.jp

Reported-by: syzbot <syzbot+854768b99f19e89d7f81@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Oliver Neukum <oneukum@suse.com>
---
 drivers/usb/class/cdc-wdm.c | 25 -------------------------
 1 file changed, 25 deletions(-)

Comments

Oliver Neukum Aug. 12, 2020, 8:02 a.m. UTC | #1
Am Mittwoch, den 12.08.2020, 08:26 +0900 schrieb Tetsuo Handa:
> syzbot is reporting hung task at wdm_flush() [1], for there is a circular
> dependency that wdm_flush() from flip_close() for /dev/cdc-wdm0 forever
> waits for /dev/raw-gadget to be closed while close() for /dev/raw-gadget
> cannot be called unless close() for /dev/cdc-wdm0 completes.
> 
> It turned out that wdm_flush() is broken in multiple aspects.

Sorry,

this seems to be a miscommunication. I was under the impression that
you were testing patches. I will push them upstream. You
cannot just remove flush() without impairing error handling.

	Regards
		Oliver


Nacked-by: Oliver Neukum <oneukum@suse.com>
Tetsuo Handa Aug. 12, 2020, 10:09 a.m. UTC | #2
On 2020/08/12 17:02, Oliver Neukum wrote:
> Am Mittwoch, den 12.08.2020, 08:26 +0900 schrieb Tetsuo Handa:
>> syzbot is reporting hung task at wdm_flush() [1], for there is a circular
>> dependency that wdm_flush() from flip_close() for /dev/cdc-wdm0 forever
>> waits for /dev/raw-gadget to be closed while close() for /dev/raw-gadget
>> cannot be called unless close() for /dev/cdc-wdm0 completes.
>>
>> It turned out that wdm_flush() is broken in multiple aspects.
> 
> Sorry,
> 
> this seems to be a miscommunication. I was under the impression that
> you were testing patches. I will push them upstream. You
> cannot just remove flush() without impairing error handling.

Then, will you check

  https://lore.kernel.org/linux-usb/254939d4-f3a1-8c7e-94e5-9862c02774fa@i-love.sakura.ne.jp/

and

  https://lore.kernel.org/linux-usb/c85331fc-874c-6e46-a77f-0ef1dc075308@i-love.sakura.ne.jp/

before pushing upstream?

> 
> 	Regards
> 		Oliver
> 
> 
> Nacked-by: Oliver Neukum <oneukum@suse.com>
>
Oliver Neukum Aug. 12, 2020, 11:19 a.m. UTC | #3
Am Mittwoch, den 12.08.2020, 19:09 +0900 schrieb Tetsuo Handa:
> On 2020/08/12 17:02, Oliver Neukum wrote:
> > Am Mittwoch, den 12.08.2020, 08:26 +0900 schrieb Tetsuo Handa:
> > > syzbot is reporting hung task at wdm_flush() [1], for there is a circular
> > > dependency that wdm_flush() from flip_close() for /dev/cdc-wdm0 forever
> > > waits for /dev/raw-gadget to be closed while close() for /dev/raw-gadget
> > > cannot be called unless close() for /dev/cdc-wdm0 completes.
> > > 
> > > It turned out that wdm_flush() is broken in multiple aspects.
> > 
> > Sorry,
> > 
> > this seems to be a miscommunication. I was under the impression that
> > you were testing patches. I will push them upstream. You
> > cannot just remove flush() without impairing error handling.
> 
> Then, will you check
> 
>   https://lore.kernel.org/linux-usb/254939d4-f3a1-8c7e-94e5-9862c02774fa@i-love.sakura.ne.jp/
> 
> and
> 
>   https://lore.kernel.org/linux-usb/c85331fc-874c-6e46-a77f-0ef1dc075308@i-love.sakura.ne.jp/
> 
> before pushing upstream?

Hi,

sure, yes, thank you for the pointers. I shall post an RFC in a few
minutes.

	Regards
		Oliver
Tetsuo Handa Aug. 27, 2020, 1:39 p.m. UTC | #4
On 2020/08/12 20:19, Oliver Neukum wrote:
> Am Mittwoch, den 12.08.2020, 19:09 +0900 schrieb Tetsuo Handa:
>> On 2020/08/12 17:02, Oliver Neukum wrote:
>>> Am Mittwoch, den 12.08.2020, 08:26 +0900 schrieb Tetsuo Handa:
>>>> syzbot is reporting hung task at wdm_flush() [1], for there is a circular
>>>> dependency that wdm_flush() from flip_close() for /dev/cdc-wdm0 forever
>>>> waits for /dev/raw-gadget to be closed while close() for /dev/raw-gadget
>>>> cannot be called unless close() for /dev/cdc-wdm0 completes.
>>>>
>>>> It turned out that wdm_flush() is broken in multiple aspects.
>>>
>>> Sorry,
>>>
>>> this seems to be a miscommunication. I was under the impression that
>>> you were testing patches. I will push them upstream. You
>>> cannot just remove flush() without impairing error handling.
>>
>> Then, will you check
>>
>>   https://lore.kernel.org/linux-usb/254939d4-f3a1-8c7e-94e5-9862c02774fa@i-love.sakura.ne.jp/
>>
>> and
>>
>>   https://lore.kernel.org/linux-usb/c85331fc-874c-6e46-a77f-0ef1dc075308@i-love.sakura.ne.jp/
>>
>> before pushing upstream?
> 
> Hi,
> 
> sure, yes, thank you for the pointers. I shall post an RFC in a few
> minutes.

Oliver, are you there? Did you get my comment at
https://lkml.kernel.org/r/ee0af733-903f-8e8f-8027-b5490a37032f@i-love.sakura.ne.jp ?

I think that we cannot avoid impairing error handling unless we change wdm_write()
to wait for response with some timeout.
diff mbox series

Patch

diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
index 7f5de95..dd7259a 100644
--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -583,30 +583,6 @@  static int service_outstanding_interrupt(struct wdm_device *desc)
 	return rv;
 }
 
-static int wdm_flush(struct file *file, fl_owner_t id)
-{
-	struct wdm_device *desc = file->private_data;
-
-	wait_event(desc->wait,
-			/*
-			 * needs both flags. We cannot do with one
-			 * because resetting it would cause a race
-			 * with write() yet we need to signal
-			 * a disconnect
-			 */
-			!test_bit(WDM_IN_USE, &desc->flags) ||
-			test_bit(WDM_DISCONNECTING, &desc->flags));
-
-	/* cannot dereference desc->intf if WDM_DISCONNECTING */
-	if (test_bit(WDM_DISCONNECTING, &desc->flags))
-		return -ENODEV;
-	if (desc->werr < 0)
-		dev_err(&desc->intf->dev, "Error in flush path: %d\n",
-			desc->werr);
-
-	return usb_translate_errors(desc->werr);
-}
-
 static __poll_t wdm_poll(struct file *file, struct poll_table_struct *wait)
 {
 	struct wdm_device *desc = file->private_data;
@@ -730,7 +706,6 @@  static long wdm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	.read =		wdm_read,
 	.write =	wdm_write,
 	.open =		wdm_open,
-	.flush =	wdm_flush,
 	.release =	wdm_release,
 	.poll =		wdm_poll,
 	.unlocked_ioctl = wdm_ioctl,