xhci: Handle TD clearing for multiple streams case

When multiple streams are in use, multiple TDs might be in flight when
an endpoint is stopped. We need to issue a Set TR Dequeue Pointer for
each, to ensure everything is reset properly and the caches cleared.
Change the logic so that any N>1 TDs found active for different streams
are deferred until after the first one is processed, calling
xhci_invalidate_cancelled_tds() again from xhci_handle_cmd_set_deq() to
queue another command until we are done with all of them. Also change
the error/"should never happen" paths to ensure we at least clear any
affected TDs, even if we can't issue a command to clear the hardware
cache, and complain loudly with an xhci_warn() if this ever happens.

This problem case dates back to commit e9df17eb1408 ("USB: xhci: Correct
assumptions about number of rings per endpoint.") early on in the XHCI
driver's life, when stream support was first added. At that point, this
condition would cause TDs to not be given back at all, causing hanging
transfers - but no security bug. It was then identified but not fixed
nor made into a warning in commit 674f8438c121 ("xhci: split handling
halted endpoints into two steps"), which added a FIXME comment for the
problem case (without materially changing the behavior as far as I can
tell, though the new logic made the problem more obvious).

Then later, in commit 94f339147fc3 ("xhci: Fix failure to give back some
cached cancelled URBs."), it was acknowledged again. This commit was
unfortunately not reviewed at all, as it was authored by the maintainer
directly. Had it been, perhaps a second set of eyes would've noticed
that it does not fix the bug, but rather just makes it (much) worse.
It turns the "transfers hang" bug into a "random memory corruption" bug,
by blindly marking TDs as complete without actually clearing them at all
nor moving the dequeue pointer past them, which means they aren't actually
complete, and the xHC will try to transfer data to/from them when the
endpoint resumes, now to freed memory buffers.

This could have been a legitimate oversight, but apparently the commit
author was aware of the problem (yet still chose to submit it): It was
still mentioned as a FIXME, an xhci_dbg() was added to log the problem
condition, and the remaining issue was mentioned in the commit
description. The choice of making the log type xhci_dbg() for what is,
at this point, a completely unhandled and known broken condition is
puzzling and unfortunate, as it guarantees that no actual users would
see the log in production, thereby making it nigh undebuggable (indeed,
even if you turn on DEBUG, the message doesn't really hint at there
being a problem at all).

It took me *months* of random xHC crashes to finally find a reliable
repro and be able to do a deep dive debug session, which could all have
been avoided had this unhandled, broken condition been actually reported
with a warning, as it should have been as a bug intentionally left in
unfixed (never mind that it shouldn't have been left in at all).

> Another fix to solve clearing the caches of all stream rings with
> cancelled TDs is needed, but not as urgent.

3 years after that statement and 14 years after the original bug was
introduced, I think it's finally time to fix it. And maybe next time
let's not leave bugs unfixed (that are actually worse than the original
bug), and let's actually get people to review kernel commits please.

Fixes xHC crashes and IOMMU faults with UAS devices when handling
errors/faults. Easiest repro is to use `hdparm` to mark an early sector
(e.g. 1024) on a disk as bad, then `cat /dev/sdX > /dev/null` in a loop.
At least in the case of JMicron controllers, the read errors end up
having to cancel two TDs (for two queued requests to different streams)
and the one that didn't get cleared properly ends up faulting the xHC
entirely when it tries to access DMA pages that have since been unmapped,
referred to by the stale TDs. This normally happens quickly (after two
or three loops). After this fix, I left the `cat` in a loop running
overnight and experienced no xHC failures, with all read errors
recovered properly. Repro'd and tested on an Apple M1 Mac Mini
(dwc3 host).

On systems without an IOMMU, this bug would instead silently corrupt
freed memory, making this a security bug (even on systems with IOMMUs
this could silently corrupt memory belonging to other USB devices on the
same controller, so it's still a security bug). Given that the kernel
autoprobes partition tables, I'm pretty sure a malicious USB device
pretending to be a UAS device and reporting an error with the right
timing could deliberately trigger a UAF and write to freed memory, with
no user action.

Fixes: e9df17eb1408 ("USB: xhci: Correct assumptions about number of rings per endpoint.")
Fixes: 94f339147fc3 ("xhci: Fix failure to give back some cached cancelled URBs.")
Fixes: 674f8438c121 ("xhci: split handling halted endpoints into two steps")
Cc: stable@vger.kernel.org
Cc: security@kernel.org
Signed-off-by: Hector Martin <marcan@marcan.st>
---
 drivers/usb/host/xhci-ring.c | 54 +++++++++++++++++++++++++++++++++++---------
 drivers/usb/host/xhci.h      |  1 +
 2 files changed, 44 insertions(+), 11 deletions(-)

---
base-commit: a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6
change-id: 20240524-xhci-streams-124e88db52e6

Best regards,

Message ID	20240524-xhci-streams-v1-1-6b1f13819bea@marcan.st (mailing list archive)
State	New, archived
Headers	show Received: from mail.marcansoft.com (marcansoft.com [212.63.210.85]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 104D51AACC; Fri, 24 May 2024 10:28:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=212.63.210.85 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716546537; cv=none; b=KBDGd+xm+sLZ4rSERStRICCdlZyH0D18XQWazFkBW3WOYs5jIwTHSVQ4dPQPe10EzVR3xXhNn+M2IQ2MnCDyDV6yFQzjpUA2zKDDG9wNPyikqNszpOGJvth9V4pV+THi25V8TUWWYjeQRJdbz3GJUf+c9FY49FKZZer267lPMwk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716546537; c=relaxed/simple; bh=pEUfJ7+mV2gms1M8RUY/bxsE4rbmpPheNfRQk1aw5xE=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=QRbqLhc0toZHJsWnkolLgH4rFHaqS2dodmx+V1lDLFHk1geXj7Amn58qzZityrVkIzstifVkRtSmVoYodT1t0mxDIuWFgNBJsOYvz7hxgcnDk0QWYJRPc8CQFKqOZK/nMMSuNOKDPvWz+Hr/4r6wFOa3V5J5xYgSnsSwh76CSrw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=marcan.st; spf=pass smtp.mailfrom=marcan.st; dkim=pass (2048-bit key) header.d=marcan.st header.i=@marcan.st header.b=tPvi+eZt; arc=none smtp.client-ip=212.63.210.85 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=marcan.st Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=marcan.st Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=marcan.st header.i=@marcan.st header.b="tPvi+eZt" Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: sendonly@marcansoft.com) by mail.marcansoft.com (Postfix) with ESMTPSA id E7DD041A36; Fri, 24 May 2024 10:28:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=marcan.st; s=default; t=1716546530; bh=pEUfJ7+mV2gms1M8RUY/bxsE4rbmpPheNfRQk1aw5xE=; h=From:Date:Subject:To:Cc; b=tPvi+eZtcjnKWcJD+fCUZj8Y+mJ9vPdpsiV27ns2EZSYvDavpUMes3I4SKJCRM80d dfYJfzjayEdsQntw8rkrn5ElRUYmcQgKRh1nrfGzH1LIMx9BhlWLx2RHjkzYs/jkCK qVZDMIjxecftblszLQTXsRZdDxh9zQuG8VIM+bpvdk50gavC2u4G15kFoYTYnSVl9p 0jBS+Qsj6cXPMXbvFLoJl4KAA35UFAQ3d9/TlOHxg/R8oZErKpXNA+tseJCSleW41M DnswXleFVDu+fs0ox0wOXEndFv8k+6RkervP8KWM4gbScaPaka25hr65PfBQqZ72eh IR7s+eKcq0NxQ== From: Hector Martin <marcan@marcan.st> Date: Fri, 24 May 2024 19:28:36 +0900 Subject: [PATCH] xhci: Handle TD clearing for multiple streams case Precedence: bulk X-Mailing-List: linux-usb@vger.kernel.org List-Id: <linux-usb.vger.kernel.org> List-Subscribe: <mailto:linux-usb+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-usb+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20240524-xhci-streams-v1-1-6b1f13819bea@marcan.st> X-B4-Tracking: v=1; b=H4sIANNrUGYC/6tWKk4tykwtVrJSqFYqSi3LLM7MzwNyDHUUlJIzE vPSU3UzU4B8JSMDIxMDUyMT3YqM5Ezd4pKi1MTcYl1DI5NUC4uUJFOjVDMloJaCotS0zAqwcdG xtbUA1xUFeV4AAAA= To: Mathias Nyman <mathias.nyman@intel.com>, Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, asahi@lists.linux.dev, stable@vger.kernel.org, security@kernel.org, Hector Martin <marcan@marcan.st> X-Mailer: b4 0.13.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=9781; i=marcan@marcan.st; h=from:subject:message-id; bh=pEUfJ7+mV2gms1M8RUY/bxsE4rbmpPheNfRQk1aw5xE=; b=owGbwMvMwCUm+yP4NEe/cRLjabUkhrSA7Hs1F6ZZ3LqT/iHS0PqZeslp2RtCGQtu/dow6WRYQ Nq9Ob/9O0pZGMS4GGTFFFkaT/Se6vacfk5dNWU6zBxWJpAhDFycAjCRa7wM/+wyVk5rP77d4PYK s1dmbjclDuXna9xWP18wKyXv2l53NyVGhpVp+lbNhV2f5F8f+dV9m3NpHstUhd2rL6j5yIrf9zf S4QMA X-Developer-Key: i=marcan@marcan.st; a=openpgp; fpr=FC18F00317968B7BE86201CBE22A629A4C515DD5
Series	xhci: Handle TD clearing for multiple streams case \| expand xhci: Handle TD clearing for multiple streams case

xhci: Handle TD clearing for multiple streams case

Commit Message

Comments

Patch