Message ID | 20230117131839.1138208-1-maze@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | c6ec929595c7443250b2a4faea988c62019d5cd2 |
Headers | show |
Series | usb: gadget: f_ncm: fix potential NULL ptr deref in ncm_bitrate() | expand |
On Tue, Jan 17, 2023 at 05:18:39AM -0800, Maciej Żenczykowski wrote: > In Google internal bug 265639009 we've received an (as yet) unreproducible > crash report from an aarch64 GKI 5.10.149-android13 running device. > > AFAICT the source code is at: > https://android.googlesource.com/kernel/common/+/refs/tags/ASB-2022-12-05_13-5.10 > > The call stack is: > ncm_close() -> ncm_notify() -> ncm_do_notify() > with the crash at: > ncm_do_notify+0x98/0x270 > Code: 79000d0b b9000a6c f940012a f9400269 (b9405d4b) > > Which I believe disassembles to (I don't know ARM assembly, but it looks sane enough to me...): > > // halfword (16-bit) store presumably to event->wLength (at offset 6 of struct usb_cdc_notification) > 0B 0D 00 79 strh w11, [x8, #6] > > // word (32-bit) store presumably to req->Length (at offset 8 of struct usb_request) > 6C 0A 00 B9 str w12, [x19, #8] > > // x10 (NULL) was read here from offset 0 of valid pointer x9 > // IMHO we're reading 'cdev->gadget' and getting NULL > // gadget is indeed at offset 0 of struct usb_composite_dev > 2A 01 40 F9 ldr x10, [x9] > > // loading req->buf pointer, which is at offset 0 of struct usb_request > 69 02 40 F9 ldr x9, [x19] > > // x10 is null, crash, appears to be attempt to read cdev->gadget->max_speed > 4B 5D 40 B9 ldr w11, [x10, #0x5c] > > which seems to line up with ncm_do_notify() case NCM_NOTIFY_SPEED code fragment: > > event->wLength = cpu_to_le16(8); > req->length = NCM_STATUS_BYTECOUNT; > > /* SPEED_CHANGE data is up/down speeds in bits/sec */ > data = req->buf + sizeof *event; > data[0] = cpu_to_le32(ncm_bitrate(cdev->gadget)); > > My analysis of registers and NULL ptr deref crash offset > (Unable to handle kernel NULL pointer dereference at virtual address 000000000000005c) > heavily suggests that the crash is due to 'cdev->gadget' being NULL when executing: > data[0] = cpu_to_le32(ncm_bitrate(cdev->gadget)); > which calls: > ncm_bitrate(NULL) > which then calls: > gadget_is_superspeed(NULL) > which reads > ((struct usb_gadget *)NULL)->max_speed > and hits a panic. > > AFAICT, if I'm counting right, the offset of max_speed is indeed 0x5C. > (remember there's a GKI KABI reservation of 16 bytes in struct work_struct) > > It's not at all clear to me how this is all supposed to work... > but returning 0 seems much better than panic-ing... > > Cc: Felipe Balbi <balbi@kernel.org> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Lorenzo Colitti <lorenzo@google.com> > Cc: Carlos Llamas <cmllamas@google.com> > Cc: stable@vger.kernel.org > Signed-off-by: Maciej Żenczykowski <maze@google.com> > --- > drivers/usb/gadget/function/f_ncm.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c > index c36bcfa0e9b4..424bb3b666db 100644 > --- a/drivers/usb/gadget/function/f_ncm.c > +++ b/drivers/usb/gadget/function/f_ncm.c > @@ -83,7 +83,9 @@ static inline struct f_ncm *func_to_ncm(struct usb_function *f) > /* peak (theoretical) bulk transfer rate in bits-per-second */ > static inline unsigned ncm_bitrate(struct usb_gadget *g) > { > - if (gadget_is_superspeed(g) && g->speed >= USB_SPEED_SUPER_PLUS) > + if (!g) > + return 0; > + else if (gadget_is_superspeed(g) && g->speed >= USB_SPEED_SUPER_PLUS) > return 4250000000U; > else if (gadget_is_superspeed(g) && g->speed == USB_SPEED_SUPER) > return 3750000000U; This looks like the wrong place to fix things - if this case is hit, don't we go on to call usb_eq_queue() which can't be valid if the gadget has been destroyed? I don't see how cdev->gadget can be set to null without cdev being freed so is this actually a use-after-free not a simple null-dereference? My guess is that somehow the gadget is being destroyed while the network interface is held open (we've seen similar issues in other, non-network, gadget functions), but I don't know enough about the network side of things to know how to cause that from userspace. John
> This looks like the wrong place to fix things - if this case is hit, > don't we go on to call usb_eq_queue() which can't be valid if the gadget > has been destroyed? > > I don't see how cdev->gadget can be set to null without cdev being freed > so is this actually a use-after-free not a simple null-dereference? > > My guess is that somehow the gadget is being destroyed while the network > interface is held open (we've seen similar issues in other, non-network, > gadget functions), but I don't know enough about the network side of > things to know how to cause that from userspace. I'm still waiting on confirmation of whether this fixes things. So far we've seen it crash twice without the fix... I don't know what triggers it - it's being triggered in some huge automated test framework. Whether the issue is lack of bind, or a too early gadget unbind... or something else... As mentioned in the patch, I'm not entirely sure if this is the right fix... I certainly don't claim to understand the usb/gadget stack. It does seem like usb_ep_queue() at least has some protections in place... though no idea if they're enough - and whether we'll hit a WARN_ON_ONCE now instead? Honestly I don't even understand *why* we're sending out this speed notification out of ncm_close... (and if we do send speed notification of out ncm_close()... shouldn't it always just say speed 0?)
diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c index c36bcfa0e9b4..424bb3b666db 100644 --- a/drivers/usb/gadget/function/f_ncm.c +++ b/drivers/usb/gadget/function/f_ncm.c @@ -83,7 +83,9 @@ static inline struct f_ncm *func_to_ncm(struct usb_function *f) /* peak (theoretical) bulk transfer rate in bits-per-second */ static inline unsigned ncm_bitrate(struct usb_gadget *g) { - if (gadget_is_superspeed(g) && g->speed >= USB_SPEED_SUPER_PLUS) + if (!g) + return 0; + else if (gadget_is_superspeed(g) && g->speed >= USB_SPEED_SUPER_PLUS) return 4250000000U; else if (gadget_is_superspeed(g) && g->speed == USB_SPEED_SUPER) return 3750000000U;
In Google internal bug 265639009 we've received an (as yet) unreproducible crash report from an aarch64 GKI 5.10.149-android13 running device. AFAICT the source code is at: https://android.googlesource.com/kernel/common/+/refs/tags/ASB-2022-12-05_13-5.10 The call stack is: ncm_close() -> ncm_notify() -> ncm_do_notify() with the crash at: ncm_do_notify+0x98/0x270 Code: 79000d0b b9000a6c f940012a f9400269 (b9405d4b) Which I believe disassembles to (I don't know ARM assembly, but it looks sane enough to me...): // halfword (16-bit) store presumably to event->wLength (at offset 6 of struct usb_cdc_notification) 0B 0D 00 79 strh w11, [x8, #6] // word (32-bit) store presumably to req->Length (at offset 8 of struct usb_request) 6C 0A 00 B9 str w12, [x19, #8] // x10 (NULL) was read here from offset 0 of valid pointer x9 // IMHO we're reading 'cdev->gadget' and getting NULL // gadget is indeed at offset 0 of struct usb_composite_dev 2A 01 40 F9 ldr x10, [x9] // loading req->buf pointer, which is at offset 0 of struct usb_request 69 02 40 F9 ldr x9, [x19] // x10 is null, crash, appears to be attempt to read cdev->gadget->max_speed 4B 5D 40 B9 ldr w11, [x10, #0x5c] which seems to line up with ncm_do_notify() case NCM_NOTIFY_SPEED code fragment: event->wLength = cpu_to_le16(8); req->length = NCM_STATUS_BYTECOUNT; /* SPEED_CHANGE data is up/down speeds in bits/sec */ data = req->buf + sizeof *event; data[0] = cpu_to_le32(ncm_bitrate(cdev->gadget)); My analysis of registers and NULL ptr deref crash offset (Unable to handle kernel NULL pointer dereference at virtual address 000000000000005c) heavily suggests that the crash is due to 'cdev->gadget' being NULL when executing: data[0] = cpu_to_le32(ncm_bitrate(cdev->gadget)); which calls: ncm_bitrate(NULL) which then calls: gadget_is_superspeed(NULL) which reads ((struct usb_gadget *)NULL)->max_speed and hits a panic. AFAICT, if I'm counting right, the offset of max_speed is indeed 0x5C. (remember there's a GKI KABI reservation of 16 bytes in struct work_struct) It's not at all clear to me how this is all supposed to work... but returning 0 seems much better than panic-ing... Cc: Felipe Balbi <balbi@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Carlos Llamas <cmllamas@google.com> Cc: stable@vger.kernel.org Signed-off-by: Maciej Żenczykowski <maze@google.com> --- drivers/usb/gadget/function/f_ncm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)