Message ID | 20241203205249.513f1153@foxbook (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | usb: xhci: Fix NULL pointer dereference on certain command aborts | expand |
I confirmed that the bug is real and behaves exactly as expected, using a USB microcontroller programmed to NAK the status stage of SET_ADDRESS requests forever and to reconnect if the host gives up enumerating it. Command timeout was reduced to 500ms to sooner reach the segment's end and some relevant debug info was added, hopefully self-explanatory: [ +0,378926] usb 10-1: new full-speed USB device number 109 using xhci_hcd [ +0,501006] xhci_hcd 0000:03:00.0: cur_cmd 0000000000000000 enq ffff88814671bff0 deq ffff88814671b000 [ +0,000001] xhci_hcd 0000:03:00.0: Timeout while waiting for setup device command [ +0,000005] xhci_hcd 0000:03:00.0: !!! avoiding dereferencing a NULL pointer !!! [ +0,712001] xhci_hcd 0000:03:00.0: cur_cmd 0000000000000000 enq ffff88814671b010 deq ffff88814671b010 [ +0,000001] xhci_hcd 0000:03:00.0: Timeout while waiting for setup device command [ +0,207981] usb 10-1: device not accepting address 109, error -62 The driver and host controller continue working normally after one hour of testing and several avoided crashes. The only thing I haven't tried is actually crashing the kernel, but considering what's inside xhci_mod_cmd_timer() I think it's obvious that this is exactly what would happen next without this patch. Regards, Michal
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 4cf5363875c7..da26e317ab0c 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -422,7 +422,8 @@ static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci, if ((xhci->cmd_ring->dequeue != xhci->cmd_ring->enqueue) && !(xhci->xhc_state & XHCI_STATE_DYING)) { xhci->current_cmd = cur_cmd; - xhci_mod_cmd_timer(xhci); + if (cur_cmd) + xhci_mod_cmd_timer(xhci); xhci_ring_cmd_db(xhci); } }
xhci_handle_stopped_cmd_ring() attempts to restart the command ring if there are pending TRBs on it, which it detects by comparing the enqueue and dequeue pointers for equality. It assumes that pending TRBs imply a pending command, and blindly dereferences cur_cmd. This can be wrong. If a command is queued to the final usable TRB of a ring segment, the enqueue pointer is advanced to the subsequent link TRB and no further. If the command is later aborted, when the abort completion is handled the dequeue pointer is advanced to the first TRB of the next segment. If no further commands are queued, the pointers stay this way and then xhci_handle_stopped_cmd_ring() is called by xhci_abort_cmd_ring() with NULL cur_cmd, which triggers cur_cmd dereference as described above. Fix this by omitting timer setup if cur_cmd is NULL. Leave the rest unchanged, including ringing the doorbell each time the ring pointers aren't equal. An unnecessary doorbell ring should be harmless. This is probably Bug 219532, but no confirmation has been received. Link: https://bugzilla.kernel.org/show_bug.cgi?id=219532 Fixes: c311e391a7ef ("xhci: rework command timeout and cancellation,") CC: stable@vger.kernel.org Signed-off-by: Michal Pecio <michal.pecio@gmail.com> --- drivers/usb/host/xhci-ring.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)