Message ID | 20180717114104.irgdb5rmg2qxclgp@debian (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 17 Jul 2018, Sudip Mukherjee wrote: > I did some more debugging. Tested with a KASAN enabled kernel and that > shows the problem. The report is attached. > > To my understanding: > > btusb_work() is calling usb_set_interface() with alternate = 0. which > again calls usb_hcd_alloc_bandwidth() and that frees the rings by > xhci_free_endpoint_ring(). That doesn't sound like the right thing to do. The rings shouldn't be freed until xhci_endpoint_disable() is called. On the other hand, there doesn't appear to be any xhci_endpoint_disable() routine, although a comment refers to it. Maybe this is the real problem? Alan Stern > But then usb_set_interface() continues and > calls usb_disable_interface() -> usb_hcd_flush_endpoint()->unlink1()-> > xhci_urb_dequeue() which at the end gives the command to stop endpoint. > > In all the cycles I have tested I see that only in the fail case > handle_cmd_completion() gets called, but in the cycles where the error > is not there handle_cmd_completion() is not called with that command. > > I am not sure what is happening, and you are the best person to understand > what is happening. :) > > But for now (untill you are back from holiday and suggest a proper solution), > I made a hacky patch (attached) which is working and I donot get any > corruption after that. Both KASAN and slub debug are also happy. > > So, now waiting for you to analyze what is going on and suggest a proper > fix. > > Thanks in advance. > > -- > Regards > Sudip > -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Alan, On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote: > On Tue, 17 Jul 2018, Sudip Mukherjee wrote: > > > I did some more debugging. Tested with a KASAN enabled kernel and that > > shows the problem. The report is attached. > > > > To my understanding: > > > > btusb_work() is calling usb_set_interface() with alternate = 0. which > > again calls usb_hcd_alloc_bandwidth() and that frees the rings by > > xhci_free_endpoint_ring(). > > That doesn't sound like the right thing to do. The rings shouldn't be > freed until xhci_endpoint_disable() is called. > > On the other hand, there doesn't appear to be any > xhci_endpoint_disable() routine, although a comment refers to it. > Maybe this is the real problem? one of your old mail might help :) https://www.spinics.net/lists/linux-usb/msg98123.html -- Regards Sudip -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 17, 2018 at 03:40:22PM +0100, Sudip Mukherjee wrote: > Hi Alan, > > On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote: > > On Tue, 17 Jul 2018, Sudip Mukherjee wrote: > > > > > I did some more debugging. Tested with a KASAN enabled kernel and that > > > shows the problem. The report is attached. > > > > > > To my understanding: > > > > > > btusb_work() is calling usb_set_interface() with alternate = 0. which > > > again calls usb_hcd_alloc_bandwidth() and that frees the rings by > > > xhci_free_endpoint_ring(). > > > > That doesn't sound like the right thing to do. The rings shouldn't be > > freed until xhci_endpoint_disable() is called. > > > > On the other hand, there doesn't appear to be any > > xhci_endpoint_disable() routine, although a comment refers to it. > > Maybe this is the real problem? > > one of your old mail might help :) > > https://www.spinics.net/lists/linux-usb/msg98123.html Wrote too soon. Is it the one you are looking for - usb_disable_endpoint() is in drivers/usb/core/message.c -- Regards Sudip -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 17 Jul 2018, Sudip Mukherjee wrote: > On Tue, Jul 17, 2018 at 03:40:22PM +0100, Sudip Mukherjee wrote: > > Hi Alan, > > > > On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote: > > > On Tue, 17 Jul 2018, Sudip Mukherjee wrote: > > > > > > > I did some more debugging. Tested with a KASAN enabled kernel and that > > > > shows the problem. The report is attached. > > > > > > > > To my understanding: > > > > > > > > btusb_work() is calling usb_set_interface() with alternate = 0. which > > > > again calls usb_hcd_alloc_bandwidth() and that frees the rings by > > > > xhci_free_endpoint_ring(). > > > > > > That doesn't sound like the right thing to do. The rings shouldn't be > > > freed until xhci_endpoint_disable() is called. > > > > > > On the other hand, there doesn't appear to be any > > > xhci_endpoint_disable() routine, although a comment refers to it. > > > Maybe this is the real problem? > > > > one of your old mail might help :) > > > > https://www.spinics.net/lists/linux-usb/msg98123.html That message seems to say the same thing as what I just wrote, more or less. > Wrote too soon. > > Is it the one you are looking for - > usb_disable_endpoint() is in drivers/usb/core/message.c No, I'm talking about xhci_endpoint_disable(), which would be called by usb_hcd_disable_endpoint() if it existed. Of course, usb_hcd_disable_endpoint() is called by usb_disable_endpoint(). Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Alan, Greg, On Tue, Jul 17, 2018 at 03:49:18PM +0100, Sudip Mukherjee wrote: > On Tue, Jul 17, 2018 at 03:40:22PM +0100, Sudip Mukherjee wrote: > > Hi Alan, > > > > On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote: > > > On Tue, 17 Jul 2018, Sudip Mukherjee wrote: > > > > > > > I did some more debugging. Tested with a KASAN enabled kernel and that > > > > shows the problem. The report is attached. > > > > > > > > To my understanding: > > > > > > > > btusb_work() is calling usb_set_interface() with alternate = 0. which > > > > again calls usb_hcd_alloc_bandwidth() and that frees the rings by > > > > xhci_free_endpoint_ring(). > > > > > > That doesn't sound like the right thing to do. The rings shouldn't be > > > freed until xhci_endpoint_disable() is called. > > > > > > On the other hand, there doesn't appear to be any > > > xhci_endpoint_disable() routine, although a comment refers to it. > > > Maybe this is the real problem? > > > > one of your old mail might help :) > > > > https://www.spinics.net/lists/linux-usb/msg98123.html > > Wrote too soon. > > Is it the one you are looking for - > usb_disable_endpoint() is in drivers/usb/core/message.c I think now I understand what the problem is. usb_set_interface() calls usb_disable_interface() which again calls usb_disable_endpoint(). This usb_disable_endpoint() gets the pointer to 'ep', marks it as NULL and sends the pointer to usb_hcd_flush_endpoint(). After flushing the endpoints usb_disable_endpoint() calls usb_hcd_disable_endpoint() which tries to do: if (hcd->driver->endpoint_disable) hcd->driver->endpoint_disable(hcd, ep); but there is no endpoint_disable() callback in xhci, so the endpoint is never marked as disabled. So, next time usb_hcd_flush_endpoint() is called I get this corruption. And this is exactly where I used to see the problem happening. And, my hacky patch worked as I prevented it from calling usb_disable_interface() in this particular case. Greg - answering your question here. My hacky patch was based on the fact that usb_hcd_alloc_bandwidth() is calling hcd->driver->drop_endpoint() and hcd->driver->add_endpoint() if (cur_alt && new_alt). So, I prevented usb_disable_interface() to be called for that same condition. And that worked as the call to usb_hcd_flush_endpoint() was not executed. I know it is not correct and I might be having memory leaks for this, but I have the system working till we get the actual fix. -- Regards Sudip -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 17.07.2018 18:10, Sudip Mukherjee wrote: > Hi Alan, Greg, > > On Tue, Jul 17, 2018 at 03:49:18PM +0100, Sudip Mukherjee wrote: >> On Tue, Jul 17, 2018 at 03:40:22PM +0100, Sudip Mukherjee wrote: >>> Hi Alan, >>> >>> On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote: >>>> On Tue, 17 Jul 2018, Sudip Mukherjee wrote: >>>> >>>>> I did some more debugging. Tested with a KASAN enabled kernel and that >>>>> shows the problem. The report is attached. >>>>> >>>>> To my understanding: >>>>> >>>>> btusb_work() is calling usb_set_interface() with alternate = 0. which >>>>> again calls usb_hcd_alloc_bandwidth() and that frees the rings by >>>>> xhci_free_endpoint_ring(). >>>> >>>> That doesn't sound like the right thing to do. The rings shouldn't be >>>> freed until xhci_endpoint_disable() is called. >>>> >>>> On the other hand, there doesn't appear to be any >>>> xhci_endpoint_disable() routine, although a comment refers to it. >>>> Maybe this is the real problem? >>> >>> one of your old mail might help :) >>> >>> https://www.spinics.net/lists/linux-usb/msg98123.html >> >> Wrote too soon. >> >> Is it the one you are looking for - >> usb_disable_endpoint() is in drivers/usb/core/message.c > > I think now I understand what the problem is. > usb_set_interface() calls usb_disable_interface() which again calls > usb_disable_endpoint(). This usb_disable_endpoint() gets the pointer > to 'ep', marks it as NULL and sends the pointer to usb_hcd_flush_endpoint(). > After flushing the endpoints usb_disable_endpoint() calls > usb_hcd_disable_endpoint() which tries to do: > if (hcd->driver->endpoint_disable) > hcd->driver->endpoint_disable(hcd, ep); > but there is no endpoint_disable() callback in xhci, so the endpoint is > never marked as disabled. So, next time usb_hcd_flush_endpoint() is > called I get this corruption. > And this is exactly where I used to see the problem happening. > > And, my hacky patch worked as I prevented it from calling > usb_disable_interface() in this particular case. > Back for a few days, looking at this xhci driver will set up all the endpoints for the new altsetting already in usb_hcd_alloc_bandwidth(). New endpoints will be ready and rings running after this. I don't know the exact history behind this, but I assume it is because xhci does all of the steps to drop/add, disable/enable endpoints and check bandwidth in a single configure endpoint command, that will return errors if there is not enough bandwidth. This command is issued in hcd->driver->check_bandwidth() This means that xhci doesn't really do much in hcd->driver->endpoint_disable or hcd->driver->endpoint_enable It also means that xhci driver assumes rings are empty when hcd->driver->check_bandwidth is called. It will bluntly free dropped rings. If there are URBs left on a endpoint ring that was dropped+added (freed+reallocated) then those URBs will contain pointers to freed ring, causing issues when usb_hcd_flush_endpoint() cancels those URBs. usb_set_interface() usb_hcd_alloc_bandwidth() hcd->driver->drop_endpoint() hcd->driver->add_endpoint() // allocates new rings hcd->driver->check_bandwidth() // issues configure endpoint command, free rings. usb_disable_interface(iface, true) usb_disable_endpoint() usb_hcd_flush_endpoint() // will access freed ring if URBs found!! usb_hcd_disable_endpoint() hcd->driver->endpoint_disable() // xhci does nothing usb_enable_interface(iface, true) usb_enable_endpoint(ep_addrss, true) // not really doing much on xhci side. As first aid I could try to implement checks that make sure the flushed URBs trb pointers really are on the current endpoint ring, and also add some warning if we are we are dropping endpoints with URBs still queued. But we need to fix this properly as well. xhci needs to be more in sync with usb core in usb_set_interface(), currently xhci has the altssetting up and running when usb core hasn't event started flushing endpoints. -Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Mathias, On Thu, Jul 19, 2018 at 01:59:01PM +0300, Mathias Nyman wrote: > On 17.07.2018 18:10, Sudip Mukherjee wrote: > > Hi Alan, Greg, > > > > On Tue, Jul 17, 2018 at 03:49:18PM +0100, Sudip Mukherjee wrote: > > > On Tue, Jul 17, 2018 at 03:40:22PM +0100, Sudip Mukherjee wrote: > > > > Hi Alan, > > > > > > > > On Tue, Jul 17, 2018 at 10:28:14AM -0400, Alan Stern wrote: > > > > > On Tue, 17 Jul 2018, Sudip Mukherjee wrote: > > > > > > > > > > > I did some more debugging. Tested with a KASAN enabled kernel and that > > > > > > shows the problem. The report is attached. > > > > > > <snip> > > > > And, my hacky patch worked as I prevented it from calling > > usb_disable_interface() in this particular case. > > > > Back for a few days, looking at this I hope you had a good holiday. :) > > xhci driver will set up all the endpoints for the new altsetting already in > usb_hcd_alloc_bandwidth(). > <snip> > > As first aid I could try to implement checks that make sure the flushed URBs > trb pointers really are on the current endpoint ring, and also add some warning > if we are we are dropping endpoints with URBs still queued. Yes, please. I think your first-aid will be a much better option than the hacky patch I am using atm. > > But we need to fix this properly as well. > xhci needs to be more in sync with usb core in usb_set_interface(), currently xhci > has the altssetting up and running when usb core hasn't event started flushing endpoints. I am able to reproduce this on almost all cycles, so I can always test the fix for you after you are fully back from your holiday. -- Regards Sudip -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 19 Jul 2018, Mathias Nyman wrote: > xhci driver will set up all the endpoints for the new altsetting already in > usb_hcd_alloc_bandwidth(). > > New endpoints will be ready and rings running after this. I don't know the exact > history behind this, but I assume it is because xhci does all of the steps to > drop/add, disable/enable endpoints and check bandwidth in a single configure > endpoint command, that will return errors if there is not enough bandwidth. That's right; Sarah and I spent some time going over this while she was working on it. But it looks like the approach isn't adequate. > This command is issued in hcd->driver->check_bandwidth() > This means that xhci doesn't really do much in hcd->driver->endpoint_disable or > hcd->driver->endpoint_enable > > It also means that xhci driver assumes rings are empty when > hcd->driver->check_bandwidth is called. It will bluntly free dropped rings. > If there are URBs left on a endpoint ring that was dropped+added > (freed+reallocated) then those URBs will contain pointers to freed ring, > causing issues when usb_hcd_flush_endpoint() cancels those URBs. > > usb_set_interface() > usb_hcd_alloc_bandwidth() > hcd->driver->drop_endpoint() > hcd->driver->add_endpoint() // allocates new rings > hcd->driver->check_bandwidth() // issues configure endpoint command, free rings. > usb_disable_interface(iface, true) > usb_disable_endpoint() > usb_hcd_flush_endpoint() // will access freed ring if URBs found!! > usb_hcd_disable_endpoint() > hcd->driver->endpoint_disable() // xhci does nothing > usb_enable_interface(iface, true) > usb_enable_endpoint(ep_addrss, true) // not really doing much on xhci side. > > As first aid I could try to implement checks that make sure the flushed URBs > trb pointers really are on the current endpoint ring, and also add some warning > if we are we are dropping endpoints with URBs still queued. > > But we need to fix this properly as well. > xhci needs to be more in sync with usb core in usb_set_interface(), currently xhci > has the altssetting up and running when usb core hasn't event started flushing endpoints. Absolutely. The core tries to be compatible with host controller drivers that either allocate bandwidth as it is requested or else allocate bandwidth all at once when an altsetting is installed. xhci-hcd falls into the second category. However, this approach requires the bandwidth verification for the new altsetting to be performed before the old altsetting has been disabled, and the xHCI hardware can't do this. We may need to change the core so that the old endpoints are disabled before the bandwidth check is done, instead of after. Of course, this leads to an awkward situation if the check fails -- we'd probably have to go back and re-install the old altsetting. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 19.07.2018 17:57, Alan Stern wrote: > On Thu, 19 Jul 2018, Mathias Nyman wrote: > >> xhci driver will set up all the endpoints for the new altsetting already in >> usb_hcd_alloc_bandwidth(). >> >> New endpoints will be ready and rings running after this. I don't know the exact >> history behind this, but I assume it is because xhci does all of the steps to >> drop/add, disable/enable endpoints and check bandwidth in a single configure >> endpoint command, that will return errors if there is not enough bandwidth. > > That's right; Sarah and I spent some time going over this while she was > working on it. But it looks like the approach isn't adequate. > >> This command is issued in hcd->driver->check_bandwidth() >> This means that xhci doesn't really do much in hcd->driver->endpoint_disable or >> hcd->driver->endpoint_enable >> >> It also means that xhci driver assumes rings are empty when >> hcd->driver->check_bandwidth is called. It will bluntly free dropped rings. >> If there are URBs left on a endpoint ring that was dropped+added >> (freed+reallocated) then those URBs will contain pointers to freed ring, >> causing issues when usb_hcd_flush_endpoint() cancels those URBs. >> >> usb_set_interface() >> usb_hcd_alloc_bandwidth() >> hcd->driver->drop_endpoint() >> hcd->driver->add_endpoint() // allocates new rings >> hcd->driver->check_bandwidth() // issues configure endpoint command, free rings. >> usb_disable_interface(iface, true) >> usb_disable_endpoint() >> usb_hcd_flush_endpoint() // will access freed ring if URBs found!! >> usb_hcd_disable_endpoint() >> hcd->driver->endpoint_disable() // xhci does nothing >> usb_enable_interface(iface, true) >> usb_enable_endpoint(ep_addrss, true) // not really doing much on xhci side. >> >> As first aid I could try to implement checks that make sure the flushed URBs >> trb pointers really are on the current endpoint ring, and also add some warning >> if we are we are dropping endpoints with URBs still queued. >> >> But we need to fix this properly as well. >> xhci needs to be more in sync with usb core in usb_set_interface(), currently xhci >> has the altssetting up and running when usb core hasn't event started flushing endpoints. > > Absolutely. The core tries to be compatible with host controller > drivers that either allocate bandwidth as it is requested or else > allocate bandwidth all at once when an altsetting is installed. > > xhci-hcd falls into the second category. However, this approach > requires the bandwidth verification for the new altsetting to be > performed before the old altsetting has been disabled, and the xHCI > hardware can't do this. > > We may need to change the core so that the old endpoints are disabled > before the bandwidth check is done, instead of after. Of course, this > leads to an awkward situation if the check fails -- we'd probably have > to go back and re-install the old altsetting. That would help xhci a lot. If we want to avoid the awkward altsetting re-install after bandwidth failure then adding a extra endpoint flush before checking the bandwidth would already help a lot. The endpoint disabling can then be remain after bandwidth checking. Does that work for other host controllers? -Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 20 Jul 2018, Mathias Nyman wrote: > >> But we need to fix this properly as well. > >> xhci needs to be more in sync with usb core in usb_set_interface(), currently xhci > >> has the altssetting up and running when usb core hasn't event started flushing endpoints. > > > > Absolutely. The core tries to be compatible with host controller > > drivers that either allocate bandwidth as it is requested or else > > allocate bandwidth all at once when an altsetting is installed. > > > > xhci-hcd falls into the second category. However, this approach > > requires the bandwidth verification for the new altsetting to be > > performed before the old altsetting has been disabled, and the xHCI > > hardware can't do this. > > > > We may need to change the core so that the old endpoints are disabled > > before the bandwidth check is done, instead of after. Of course, this > > leads to an awkward situation if the check fails -- we'd probably have > > to go back and re-install the old altsetting. > > That would help xhci a lot. > > If we want to avoid the awkward altsetting re-install after bandwidth failure > then adding a extra endpoint flush before checking the bandwidth would already help a lot. > > The endpoint disabling can then be remain after bandwidth checking. > Does that work for other host controllers? As far as I know, the other host controller drivers don't really care how this is done. xHCI is the only technology where the hardware has to verify the bandwidth requirements. (Maybe some other SuperSpeed controller design also cares, but if so then this change is unlikely to hurt.) Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From cbbe6dc59ac90a4f2c358de56e58e254320171e0 Mon Sep 17 00:00:00 2001 From: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Date: Tue, 10 Jul 2018 09:50:00 +0100 Subject: [PATCH] hacky solution to mem-corruption Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> --- drivers/usb/core/message.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c index 7cd4ec33dbf4..7fdf7a27611d 100644 --- a/drivers/usb/core/message.c +++ b/drivers/usb/core/message.c @@ -1398,7 +1398,8 @@ int usb_set_interface(struct usb_device *dev, int interface, int alternate) remove_intf_ep_devs(iface); usb_remove_sysfs_intf_files(iface); } - usb_disable_interface(dev, iface, true); + if (!(iface->cur_altsetting && alt)) + usb_disable_interface(dev, iface, true); iface->cur_altsetting = alt; -- 2.11.0