Message ID | 20190204180422.5095-2-guido@kiener-muenchen.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/3] udc: net2280: Fix net2280_disable | expand |
On Mon, 4 Feb 2019, Guido Kiener wrote: > From: Guido Kiener <guido.kiener@rohde-schwarz.com> > > The OUT endpoint normally blocks (NAK) subsequent packets when a > short packet is received and returns an incomplete queue entry to > the gadget driver. Thereby the gadget driver can detect > a short packet when reading queue entries with a length that is > not equal to a multiple of packet size. > > The start_queue() function enables receiving OUT packets regardless > of the content of the OUT FIFO. This results in a problem: > When receiving is enabled more OUT packets are appended to the last > short packet and the gadget driver cannot determine the end of a > short packet anymore. Furthermore the remaining data in the OUT FIFO > is not delivered to the gadget driver immediately and can produce > timeout errors. How can that happen? When the short packet is received, the endpoint immediately starts sending NAKs, so no more data can enter the FIFO. Furthermore, the incomplete transfer is returned to the gadget driver, which removes the short packet from the FIFO. So the FIFO has to be empty when start_queue() is called. Alan Stern > This fix only stops OUT naking when all FIFO data is delivered > to the gadget driver and the OUT FIFO is empty. > > Signed-off-by: Guido Kiener <guido.kiener@rohde-schwarz.com> > --- > drivers/usb/gadget/udc/net2280.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/usb/gadget/udc/net2280.c b/drivers/usb/gadget/udc/net2280.c > index 7154f00dea40..1cb58fd5d1c6 100644 > --- a/drivers/usb/gadget/udc/net2280.c > +++ b/drivers/usb/gadget/udc/net2280.c > @@ -867,7 +867,7 @@ static void start_queue(struct net2280_ep *ep, u32 dmactl, u32 td_dma) > > writel(BIT(DMA_START), &dma->dmastat); > > - if (!ep->is_in) > + if (!ep->is_in && readl(&ep->regs->ep_avail) == 0) > stop_out_naking(ep); > }
Zitat von Alan Stern <stern@rowland.harvard.edu>: > On Mon, 4 Feb 2019, Guido Kiener wrote: > >> From: Guido Kiener <guido.kiener@rohde-schwarz.com> >> >> The OUT endpoint normally blocks (NAK) subsequent packets when a >> short packet is received and returns an incomplete queue entry to >> the gadget driver. Thereby the gadget driver can detect >> a short packet when reading queue entries with a length that is >> not equal to a multiple of packet size. >> >> The start_queue() function enables receiving OUT packets regardless >> of the content of the OUT FIFO. This results in a problem: >> When receiving is enabled more OUT packets are appended to the last >> short packet and the gadget driver cannot determine the end of a >> short packet anymore. Furthermore the remaining data in the OUT FIFO >> is not delivered to the gadget driver immediately and can produce >> timeout errors. > > How can that happen? When the short packet is received, the endpoint > immediately starts sending NAKs, so no more data can enter the FIFO. > Furthermore, the incomplete transfer is returned to the gadget driver, > which removes the short packet from the FIFO. So the FIFO has to be > empty when start_queue() is called. I don't remember the exact scenario. It happens something like this: When you have only one request (e.g. 4kB) left in your endpoint OUT queue and a host sends 4kB + some extra bytes, then your request (4kB) is delivered to the gadget driver while the extra bytes (short packet) are still in the OUT FIFO. I try to reconstruct the situation (without fix) how the driver can call (re)start_dma. I do not see a reasonable path in the moment. Guido >> This fix only stops OUT naking when all FIFO data is delivered >> to the gadget driver and the OUT FIFO is empty.
On Tue, 5 Feb 2019 guido@kiener-muenchen.de wrote: > > Zitat von Alan Stern <stern@rowland.harvard.edu>: > > > On Mon, 4 Feb 2019, Guido Kiener wrote: > > > >> From: Guido Kiener <guido.kiener@rohde-schwarz.com> > >> > >> The OUT endpoint normally blocks (NAK) subsequent packets when a > >> short packet is received and returns an incomplete queue entry to > >> the gadget driver. Thereby the gadget driver can detect > >> a short packet when reading queue entries with a length that is > >> not equal to a multiple of packet size. > >> > >> The start_queue() function enables receiving OUT packets regardless > >> of the content of the OUT FIFO. This results in a problem: > >> When receiving is enabled more OUT packets are appended to the last > >> short packet and the gadget driver cannot determine the end of a > >> short packet anymore. Furthermore the remaining data in the OUT FIFO > >> is not delivered to the gadget driver immediately and can produce > >> timeout errors. > > > > How can that happen? When the short packet is received, the endpoint > > immediately starts sending NAKs, so no more data can enter the FIFO. > > Furthermore, the incomplete transfer is returned to the gadget driver, > > which removes the short packet from the FIFO. So the FIFO has to be > > empty when start_queue() is called. > > I don't remember the exact scenario. It happens something like this: > When you have only one request (e.g. 4kB) left in your endpoint OUT > queue and a host sends 4kB + some extra bytes, then your request (4kB) > is delivered to the gadget driver while the extra bytes (short packet) > are still in the OUT FIFO. > I try to reconstruct the situation (without fix) how the driver can call > (re)start_dma. I do not see a reasonable path in the moment. Okay, I see. If there's no outstanding request when the short packet arrives, the data will remain in the FIFO. But won't this situation be detected in start_dma(), which is the only place where start_queue() is called? start_dma() has code to test specifically for a previous short OUT packet. Should the fix go there instead? Alan Stern
Zitat von Alan Stern <stern@rowland.harvard.edu>: > On Tue, 5 Feb 2019 guido@kiener-muenchen.de wrote: > >> >> Zitat von Alan Stern <stern@rowland.harvard.edu>: >> >> > On Mon, 4 Feb 2019, Guido Kiener wrote: >> > >> >> From: Guido Kiener <guido.kiener@rohde-schwarz.com> >> >> >> >> The OUT endpoint normally blocks (NAK) subsequent packets when a >> >> short packet is received and returns an incomplete queue entry to >> >> the gadget driver. Thereby the gadget driver can detect >> >> a short packet when reading queue entries with a length that is >> >> not equal to a multiple of packet size. >> >> >> >> The start_queue() function enables receiving OUT packets regardless >> >> of the content of the OUT FIFO. This results in a problem: >> >> When receiving is enabled more OUT packets are appended to the last >> >> short packet and the gadget driver cannot determine the end of a >> >> short packet anymore. Furthermore the remaining data in the OUT FIFO >> >> is not delivered to the gadget driver immediately and can produce >> >> timeout errors. >> > >> > How can that happen? When the short packet is received, the endpoint >> > immediately starts sending NAKs, so no more data can enter the FIFO. >> > Furthermore, the incomplete transfer is returned to the gadget driver, >> > which removes the short packet from the FIFO. So the FIFO has to be >> > empty when start_queue() is called. >> >> I don't remember the exact scenario. It happens something like this: >> When you have only one request (e.g. 4kB) left in your endpoint OUT >> queue and a host sends 4kB + some extra bytes, then your request (4kB) >> is delivered to the gadget driver while the extra bytes (short packet) >> are still in the OUT FIFO. >> I try to reconstruct the situation (without fix) how the driver can call >> (re)start_dma. I do not see a reasonable path in the moment. > > Okay, I see. If there's no outstanding request when the short packet > arrives, the data will remain in the FIFO. > > But won't this situation be detected in start_dma(), which is the only > place where start_queue() is called? start_dma() has code to test > specifically for a previous short OUT packet. Should the fix go there > instead? You are right. Unfortunately I did not save a callstack. I guess it was more a restart_dma(). We do a lot of unusual things like stalling endpoints, clear_halt, dequeue requests and so on. The biggest problem is to stop/abort the DMA (I have some more comlexer patches in the queue). I just remember that it took me a long time to find this bug and I try to reproduce it this week. Guido
Zitat von Alan Stern <stern@rowland.harvard.edu>: > On Tue, 5 Feb 2019 guido@kiener-muenchen.de wrote: > >> >> Zitat von Alan Stern <stern@rowland.harvard.edu>: >> >> > On Mon, 4 Feb 2019, Guido Kiener wrote: >> > >> >> From: Guido Kiener <guido.kiener@rohde-schwarz.com> >> >> >> >> The OUT endpoint normally blocks (NAK) subsequent packets when a >> >> short packet is received and returns an incomplete queue entry to >> >> the gadget driver. Thereby the gadget driver can detect >> >> a short packet when reading queue entries with a length that is >> >> not equal to a multiple of packet size. >> >> >> >> The start_queue() function enables receiving OUT packets regardless >> >> of the content of the OUT FIFO. This results in a problem: >> >> When receiving is enabled more OUT packets are appended to the last >> >> short packet and the gadget driver cannot determine the end of a >> >> short packet anymore. Furthermore the remaining data in the OUT FIFO >> >> is not delivered to the gadget driver immediately and can produce >> >> timeout errors. >> > >> > How can that happen? When the short packet is received, the endpoint >> > immediately starts sending NAKs, so no more data can enter the FIFO. >> > Furthermore, the incomplete transfer is returned to the gadget driver, >> > which removes the short packet from the FIFO. So the FIFO has to be >> > empty when start_queue() is called. >> >> I don't remember the exact scenario. It happens something like this: >> When you have only one request (e.g. 4kB) left in your endpoint OUT >> queue and a host sends 4kB + some extra bytes, then your request (4kB) >> is delivered to the gadget driver while the extra bytes (short packet) >> are still in the OUT FIFO. >> I try to reconstruct the situation (without fix) how the driver can call >> (re)start_dma. I do not see a reasonable path in the moment. > > Okay, I see. If there's no outstanding request when the short packet > arrives, the data will remain in the FIFO. > > But won't this situation be detected in start_dma(), which is the only > place where start_queue() is called? start_dma() has code to test > specifically for a previous short OUT packet. Should the fix go there > instead? > Alan, I have inserted a bug detection and can now explain the race: Ween receiving a long transfer the gadget driver queues a new request with net2280_queue() and calls start_dma(). In the meantime the OUT FIFO is still receiving data and the short packet is not detected yet. So the start_dma() function does not use the shortcut in line: https://elixir.bootlin.com/linux/v5.0-rc5/source/drivers/usb/gadget/udc/net2280.c#L905 Thus start_queue() is called at the end of the function which calls stop_out_naking(). This will allow receiving new data, even when a short packet is arrived in the meantime. Indeed, I'm not sure now whether my patch only minimizes the race or whether we need to rethink the logic here and call stop_out_naking() when the request is really done and delivered to the gadget driver. See function done(). Here is my code sample how I can detect the race (USB 2.0 with max packet size of 512): static void start_queue(struct net2280_ep *ep, u32 dmactl, u32 td_dma) { struct net2280_dma_regs __iomem *dma = ep->dma; unsigned int tmp = BIT(VALID_BIT) | (ep->is_in << DMA_DIRECTION); if (!(ep->dev->quirks & PLX_2280)) tmp |= BIT(END_OF_CHAIN); ep_vdbg(ep->dev, "start_queue\n"); writel(tmp, &dma->dmacount); writel(readl(&dma->dmastat), &dma->dmastat); writel(td_dma, &dma->dmadesc); if (ep->dev->quirks & PLX_PCIE) dmactl |= BIT(DMA_REQUEST_OUTSTANDING); writel(dmactl, &dma->dmactl); /* erratum 0116 workaround part 3: pci arbiter away from net2280 */ (void) readl(&ep->dev->pci->pcimstctl); // Show bug begin unsigned int avail = readl(&ep->regs->ep_avail); if (!ep->is_in && (avail % 512) != 0) { u32 t1, t2; t1 = readl(&ep->cfg->ep_cfg); t2 = readl(&ep->regs->ep_rsp) & 0xff; ep_err(ep->dev, " stat %08x avail %04x (ep%d%s-%s)%s\n", readl(&ep->regs->ep_stat), readl(&ep->regs->ep_avail), t1 & 0x0f, DIR_STRING(t1), type_string(t1 >> 8), ep->stopped ? "*" : ""); WARN_ON(1); } // Show bug end writel(BIT(DMA_START), &dma->dmastat); //if (!ep->is_in && readl(&ep->regs->ep_avail) == 0) //<- patch if (!ep->is_in) stop_out_naking(ep); } This is the corresponding callstack. Flag "NAK packets" (BIT4) is set in EP_STAT and there are now 944 (0x3b0) bytes in the OUT FIFO. net2280 0000:05:05.0: stat 0001103a avail 03b0 (ep1out-bulk) ------------[ cut here ]------------ WARNING: CPU: 0 PID: 32093 at net2280.c:915 start_dma+0x1b2/0x2c0 [net2280]() Call Trace: dump_stack+0x63/0x90 warn_slowpath_common+0x82/0xc0 warn_slowpath_null+0x1a/0x20 start_dma+0x1b2/0x2c0 [net2280] net2280_queue+0x247/0x390 [net2280] _enqueueBulkoutBuf+0xa8/0x1f0 [rsusbtmc] dev_read+0x33e/0x490 [rsusbtmc] ? security_file_permission+0xa0/0xc0 __vfs_read+0x18/0x40 vfs_read+0x86/0x130 SyS_read+0x55/0xc0 ? SyS_ioctl+0x63/0x90 What do you think? Regards, Guido
On Wed, 6 Feb 2019 guido@kiener-muenchen.de wrote: > Alan, > > I have inserted a bug detection and can now explain the race: > > Ween receiving a long transfer the gadget driver queues a new request with > net2280_queue() and calls start_dma(). In the meantime the OUT FIFO is still > receiving data and the short packet is not detected yet. So the start_dma() > function does not use the shortcut in line: > > https://elixir.bootlin.com/linux/v5.0-rc5/source/drivers/usb/gadget/udc/net2280.c#L905 > > Thus start_queue() is called at the end of the function which calls > stop_out_naking(). This will allow receiving new data, even when a short > packet is arrived in the meantime. I'm confused. A transfer is in progress, let's say for request R1. Before R1 finishes, the gadget driver queues a new request, R2. Since R1 is still running, ep->queue is not empty, so net2280_queue() will not call start_dma() -- it will call queue_dma() instead. But this is not what your stack dump shows. Maybe you meant that R1 completes and then afterward the gadget driver queues R2. Then there is a race: start_dma() sees that no short packet has arrived, but a short packet could arrive before stop_out_naking() is called. > Indeed, I'm not sure now whether my patch only minimizes the race or > whether we need to rethink the logic here and call stop_out_naking() when > the request is really done and delivered to the gadget driver. > See function done(). The right time to call stop_out_naking is immediately after we complete a short transfer (that is, a transfer which ended with a short packet). start_queue() is definitely the wrong place. Right at the end of done() is probably the right place. But do it only if the last packet of the transfer was a short one. And note that this does not necessarily mean the transfer got an error: If the request was for only 15 bytes then it could complete successfully but the (only) packet would still be short. Alan Stern
Zitat von Alan Stern <stern@rowland.harvard.edu>: > On Wed, 6 Feb 2019 guido@kiener-muenchen.de wrote: > >> Alan, >> >> I have inserted a bug detection and can now explain the race: >> >> Ween receiving a long transfer the gadget driver queues a new request with >> net2280_queue() and calls start_dma(). In the meantime the OUT FIFO is still >> receiving data and the short packet is not detected yet. So the start_dma() >> function does not use the shortcut in line: >> >> https://elixir.bootlin.com/linux/v5.0-rc5/source/drivers/usb/gadget/udc/net2280.c#L905 >> >> Thus start_queue() is called at the end of the function which calls >> stop_out_naking(). This will allow receiving new data, even when a short >> packet is arrived in the meantime. > > I'm confused. A transfer is in progress, let's say for request R1. > Before R1 finishes, the gadget driver queues a new request, R2. Since > R1 is still running, ep->queue is not empty, so net2280_queue() will > not call start_dma() -- it will call queue_dma() instead. But this is > not what your stack dump shows. > > Maybe you meant that R1 completes and then afterward the gadget driver > queues R2. Then there is a race: start_dma() sees that no short packet > has arrived, but a short packet could arrive before stop_out_naking() > is called. Yes, I mean the second case. Currently we put only one request into the OUT queue. Using 2-3 requests would even improve bandwidth for long transfers, however it does not help to avoid the bug when the application input is blocked. >> Indeed, I'm not sure now whether my patch only minimizes the race or >> whether we need to rethink the logic here and call stop_out_naking() when >> the request is really done and delivered to the gadget driver. >> See function done(). > > The right time to call stop_out_naking is immediately after we complete > a short transfer (that is, a transfer which ended with a short packet). > start_queue() is definitely the wrong place. > > Right at the end of done() is probably the right place. But do it only > if the last packet of the transfer was a short one. And note that this > does not necessarily mean the transfer got an error: If the request was > for only 15 bytes then it could complete successfully but the (only) > packet would still be short. I tried to call stop_out_naking() at the end of done(). However I have overseen that the author of the driver only wants to receive data when there is an OUT request available. E.g. see: https://elixir.bootlin.com/linux/v5.0-rc5/source/drivers/usb/gadget/udc/net2280.c#L304 So I think my patch was not so "bad" and I assume we can prevent the race with the following patch (or easier see below): static void start_queue(struct net2280_ep *ep, u32 dmactl, u32 td_dma) { struct net2280_dma_regs __iomem *dma = ep->dma; unsigned int tmp = BIT(VALID_BIT) | (ep->is_in << DMA_DIRECTION); if (!(ep->dev->quirks & PLX_2280)) tmp |= BIT(END_OF_CHAIN); writel(tmp, &dma->dmacount); writel(readl(&dma->dmastat), &dma->dmastat); writel(td_dma, &dma->dmadesc); if (ep->dev->quirks & PLX_PCIE) dmactl |= BIT(DMA_REQUEST_OUTSTANDING); writel(dmactl, &dma->dmactl); /* erratum 0116 workaround part 3: pci arbiter away from net2280 */ (void) readl(&ep->dev->pci->pcimstctl); if (!ep->is_in && (readl(&ep->regs->ep_stat) & BIT(NAK_OUT_PACKETS)) && (readl(&ep->regs->ep_avail) == 0)) { stop_out_naking(ep); } writel(BIT(DMA_START), &dma->dmastat); } start_queue() is only called by start_dma(). I see that we can simplify this patch when we just insert the stop_out_nacking(ep) between line 909 and 910: https://elixir.bootlin.com/linux/v5.0-rc5/source/drivers/usb/gadget/udc/net2280.c#L909 and remove the stop_out_naking() call in start_queue(). I will test this over weekend. Regards, Guido
diff --git a/drivers/usb/gadget/udc/net2280.c b/drivers/usb/gadget/udc/net2280.c index 7154f00dea40..1cb58fd5d1c6 100644 --- a/drivers/usb/gadget/udc/net2280.c +++ b/drivers/usb/gadget/udc/net2280.c @@ -867,7 +867,7 @@ static void start_queue(struct net2280_ep *ep, u32 dmactl, u32 td_dma) writel(BIT(DMA_START), &dma->dmastat); - if (!ep->is_in) + if (!ep->is_in && readl(&ep->regs->ep_avail) == 0) stop_out_naking(ep); }