Message ID | 80eace7a-976d-65a5-a353-54a2b18edd06@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Mathias, On Thu, May 24, 2018 at 04:35:34PM +0300, Mathias Nyman wrote: > Hi > > On 24.05.2018 00:29, Sudip Mukherjee wrote: > >Hi Mathias, > > > >>>On Fri, May 18, 2018 at 03:55:04PM +0300, Mathias Nyman wrote: > >>>>Hi, <snip> > >>>> > >>>> > >>>>Can you enable tracing for xhci and send me the output. > >>> > >We have finally reproduced the error while traces were on. The trace and > >the relevant part of the dmesg (when the error starts) are in: > >https://drive.google.com/open?id=1PbcYwL1a9ndtHw1MNjE6uVqb0fFX9jV8 > > > >Will request you to have a look and suggest what might be going wrong here. > > > > Log show two rings having the same TRB segment dma address, this will completely mess up the transfer: > > While allocating rigs the enque pointers for the two rings are the same: > > 461.859315: xhci_ring_alloc: ISOC efa4e580: enq 0x0000000033386000(0x0000000033386000) deq 0x0000000033386000(0x0000000033386000) segs 2 stream 0 ...bs > 461.859320: xhci_ring_alloc: ISOC f0ce1f00: enq 0x0000000033386000(0x0000000033386000) deq 0x0000000033386000(0x0000000033386000) segs 2 stream 0 ... > > URBs for ISOC IN transfers are queued on EP3 at enqueue address (33386000 to 33386140) > > 461.859998: xhci_urb_enqueue: ep3in-isoc: urb f0ec0e00 pipe 4294528 slot 8 length 0/170 sgs 0/0 stream 0 flags 00010302 > 461.860004: xhci_queue_trb: ISOC: Buffer 000000002b480240 length 17 TD size 0 intr 0 type 'Isoch' flags b:i:I:c:s:I:e:c > 461.860006: xhci_inc_enq: ISOC f0ce1f00: enq 0x0000000033386010(0x0000000033386000) deq 0x0000000033386000(0x0000000033386000 > > Later URBs for ISOC OUT transfers are queued at the same address, this should not happen: > > 461.901175: xhci_urb_enqueue: ep3out-isoc: urb ecec2600 pipe 100096 slot 8 length 0/51 sgs 0/0 stream 0 flags 00010002 > 461.901180: xhci_queue_trb: ISOC: Buffer 000000002d9fa805 length 17 TD size 0 intr 0 type 'Isoch' flags b:i:I:c:s:i:e:c > 461.901181: xhci_inc_enq: ISOC efa4e580: enq 0x0000000033386010(0x0000000033386000) deq 0x0000000033386000(0x0000000033386000) > > So something goes really wrong when allocating or setting up the rings in one of these functions: > xhci_ring_alloc() > xhci_alloc_segments_for_ring() > xhci_initialize_ring_info() > xhci_segment_alloc() > xhci_link_segments() > dma_pool_zalloc() > > To verify and rule out dma_pool_zalloc(), could you apply the attached patch and reproduce with new logs? We tested for the full week but still could not reproduce with the patch applied. We are still trying and will be setting up automated tests for this. And, since we are not able to reproduce it, I was wondering if it is somekind of race and the applied patch with extra tracing has changed the timing in such a way that it is not seen now. And also, wondering if 2b3ff282dff3 ("xhci: Don't add a virt_dev to the devs array before it's fully allocated") will be of any help to us. -- Regards Sudip -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Mathias, On Thu, May 24, 2018 at 04:35:34PM +0300, Mathias Nyman wrote: > Hi > > On 24.05.2018 00:29, Sudip Mukherjee wrote: > > Hi Mathias, > > > > On Fri, May 18, 2018 at 04:19:02PM +0300, Mathias Nyman wrote: > > > On 18.05.2018 16:04, Sudip Mukherjee wrote: <snip> > > > > > > We have finally reproduced the error while traces were on. The trace and > > the relevant part of the dmesg (when the error starts) are in: > > https://drive.google.com/open?id=1PbcYwL1a9ndtHw1MNjE6uVqb0fFX9jV8 > > > > Will request you to have a look and suggest what might be going wrong here. > > > > Log show two rings having the same TRB segment dma address, this will completely mess up the transfer: > > While allocating rigs the enque pointers for the two rings are the same: > > 461.859315: xhci_ring_alloc: ISOC efa4e580: enq 0x0000000033386000(0x0000000033386000) deq 0x0000000033386000(0x0000000033386000) segs 2 stream 0 ...bs > 461.859320: xhci_ring_alloc: ISOC f0ce1f00: enq 0x0000000033386000(0x0000000033386000) deq 0x0000000033386000(0x0000000033386000) segs 2 stream 0 ... > > URBs for ISOC IN transfers are queued on EP3 at enqueue address (33386000 to 33386140) > > 461.859998: xhci_urb_enqueue: ep3in-isoc: urb f0ec0e00 pipe 4294528 slot 8 length 0/170 sgs 0/0 stream 0 flags 00010302 > 461.860004: xhci_queue_trb: ISOC: Buffer 000000002b480240 length 17 TD size 0 intr 0 type 'Isoch' flags b:i:I:c:s:I:e:c > 461.860006: xhci_inc_enq: ISOC f0ce1f00: enq 0x0000000033386010(0x0000000033386000) deq 0x0000000033386000(0x0000000033386000 > > Later URBs for ISOC OUT transfers are queued at the same address, this should not happen: > > 461.901175: xhci_urb_enqueue: ep3out-isoc: urb ecec2600 pipe 100096 slot 8 length 0/51 sgs 0/0 stream 0 flags 00010002 > 461.901180: xhci_queue_trb: ISOC: Buffer 000000002d9fa805 length 17 TD size 0 intr 0 type 'Isoch' flags b:i:I:c:s:i:e:c > 461.901181: xhci_inc_enq: ISOC efa4e580: enq 0x0000000033386010(0x0000000033386000) deq 0x0000000033386000(0x0000000033386000) > > So something goes really wrong when allocating or setting up the rings in one of these functions: > xhci_ring_alloc() > xhci_alloc_segments_for_ring() > xhci_initialize_ring_info() > xhci_segment_alloc() > xhci_link_segments() > dma_pool_zalloc() > > To verify and rule out dma_pool_zalloc(), could you apply the attached patch and reproduce with new logs? I spoke too soon in my yesterday's mail. We were able to reproduce it on the automated tests. The log and the trace is at: https://drive.google.com/open?id=1h-3r-1lfjg8oblBGkzdRIq8z3ZNgGZx- Will request you to have a look at it. -- Regards Sudip -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04.06.2018 18:28, Sudip Mukherjee wrote: > On Thu, May 24, 2018 at 04:35:34PM +0300, Mathias Nyman wrote: >> >> Log show two rings having the same TRB segment dma address, this will completely mess up the transfer: >> >> While allocating rigs the enque pointers for the two rings are the same: >> >> 461.859315: xhci_ring_alloc: ISOC efa4e580: enq 0x0000000033386000(0x0000000033386000) deq 0x0000000033386000(0x0000000033386000) segs 2 stream 0 ...bs >> 461.859320: xhci_ring_alloc: ISOC f0ce1f00: enq 0x0000000033386000(0x0000000033386000) deq 0x0000000033386000(0x0000000033386000) segs 2 stream 0 ... >> >> So something goes really wrong when allocating or setting up the rings in one of these functions: >> >> To verify and rule out dma_pool_zalloc(), could you apply the attached patch and reproduce with new logs? > > I spoke too soon in my yesterday's mail. We were able to reproduce it > on the automated tests. The log and the trace is at: > https://drive.google.com/open?id=1h-3r-1lfjg8oblBGkzdRIq8z3ZNgGZx- > > Will request you to have a look at it. > Odd and unlikely, but to me this looks like some issue in allocating dma memory from pool using dma_pool_zalloc() Adding people with DMA knowledge to cc, maybe someone knows what is going on. Here's the story: Sudip sees usb issues on a Intel Atom based board with 4.14.2 kernel. All tracing points to dma_pool_zalloc() returning the same dma address block on consecutive calls. In the failing case dma_pool_zalloc() is called 3 - 6us apart. <...>-26362 [002] .... 1186.756739: xhci_ring_mem_detail: MATTU xhci_segment_alloc dma @ 0x000000002d92b000 <...>-26362 [002] .... 1186.756745: xhci_ring_mem_detail: MATTU xhci_segment_alloc dma @ 0x000000002d92b000 <...>-26362 [002] .... 1186.756748: xhci_ring_mem_detail: MATTU xhci_segment_alloc dma @ 0x000000002d92b000 dma_pool_zalloc() is called from xhci_segment_alloc() in drivers/usb/host/xhci-mem.c see: https://elixir.bootlin.com/linux/v4.14.2/source/drivers/usb/host/xhci-mem.c#L52 prints above are custom traces added right after dma_pool_zalloc() @@ -44,10 +44,15 @@ static struct xhci_segment *xhci_segment_alloc(struct xhci_hcd *xhci, return NULL; } + xhci_dbg_trace(xhci, trace_xhci_ring_mem_detail, + "MATTU xhci_segment_alloc dma @ %pad", &dma); + Any idea what's going on? dma_pool_alloc() has a comment that it drops &pool->lock if it needs to allocate a page, can it be related? Thanks -Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2018-06-06 at 17:12 +0300, Mathias Nyman wrote: > On 04.06.2018 18:28, Sudip Mukherjee wrote: > > On Thu, May 24, 2018 at 04:35:34PM +0300, Mathias Nyman wrote: > > > > Odd and unlikely, but to me this looks like some issue in allocating > dma memory > from pool using dma_pool_zalloc() > > Adding people with DMA knowledge to cc, maybe someone knows what is > going on. > > Here's the story: > Sudip sees usb issues on a Intel Atom based board with 4.14.2 kernel. > All tracing points to dma_pool_zalloc() returning the same dma address > block on > consecutive calls. > > In the failing case dma_pool_zalloc() is called 3 - 6us apart. > > <...>-26362 [002] .... 1186.756739: xhci_ring_mem_detail: MATTU > xhci_segment_alloc dma @ 0x000000002d92b000 > <...>-26362 [002] .... 1186.756745: xhci_ring_mem_detail: MATTU > xhci_segment_alloc dma @ 0x000000002d92b000 > <...>-26362 [002] .... 1186.756748: xhci_ring_mem_detail: MATTU > xhci_segment_alloc dma @ 0x000000002d92b000 > > dma_pool_zalloc() is called from xhci_segment_alloc() in > drivers/usb/host/xhci-mem.c > see: > https://elixir.bootlin.com/linux/v4.14.2/source/drivers/usb/host/xhci- > mem.c#L52 > > prints above are custom traces added right after dma_pool_zalloc() For better understanding it would be good to have dma_pool_free() calls debugged as well. Is it possible that something in parallel just fast enough to free the allocated resource from pool?
On Wed, Jun 06, 2018 at 05:12:21PM +0300, Mathias Nyman wrote: > On 04.06.2018 18:28, Sudip Mukherjee wrote: > > On Thu, May 24, 2018 at 04:35:34PM +0300, Mathias Nyman wrote: > > > <snip> > > > > Will request you to have a look at it. > > > > Odd and unlikely, but to me this looks like some issue in allocating dma memory > from pool using dma_pool_zalloc() > > Adding people with DMA knowledge to cc, maybe someone knows what is going on. Thanks Mathias. -- Regards Sudip -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Andy, And we meet again. :) On Wed, Jun 06, 2018 at 06:36:35PM +0300, Andy Shevchenko wrote: > On Wed, 2018-06-06 at 17:12 +0300, Mathias Nyman wrote: > > On 04.06.2018 18:28, Sudip Mukherjee wrote: > > > On Thu, May 24, 2018 at 04:35:34PM +0300, Mathias Nyman wrote: > > > > > > > Odd and unlikely, but to me this looks like some issue in allocating > > dma memory > > from pool using dma_pool_zalloc() > > > > Adding people with DMA knowledge to cc, maybe someone knows what is > > going on. > > > > Here's the story: > > Sudip sees usb issues on a Intel Atom based board with 4.14.2 kernel. > > All tracing points to dma_pool_zalloc() returning the same dma address > > block on > > consecutive calls. > > > > In the failing case dma_pool_zalloc() is called 3 - 6us apart. > > > > <...>-26362 [002] .... 1186.756739: xhci_ring_mem_detail: MATTU > > xhci_segment_alloc dma @ 0x000000002d92b000 > > <...>-26362 [002] .... 1186.756745: xhci_ring_mem_detail: MATTU > > xhci_segment_alloc dma @ 0x000000002d92b000 > > <...>-26362 [002] .... 1186.756748: xhci_ring_mem_detail: MATTU > > xhci_segment_alloc dma @ 0x000000002d92b000 > > > > dma_pool_zalloc() is called from xhci_segment_alloc() in > > drivers/usb/host/xhci-mem.c > > see: > > https://elixir.bootlin.com/linux/v4.14.2/source/drivers/usb/host/xhci- > > mem.c#L52 > > > > prints above are custom traces added right after dma_pool_zalloc() > > For better understanding it would be good to have dma_pool_free() calls > debugged as well. So, I am adding another trace event for dma_pool_free() and continuing with the test. Is there anything else that I should be adding as debug? -- Regards Sudip -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From 7aee4db28204fddff6cbc1534b8d50f13fd0b141 Mon Sep 17 00:00:00 2001 From: Mathias Nyman <mathias.nyman@linux.intel.com> Date: Thu, 24 May 2018 15:37:41 +0300 Subject: [PATCH] xhci: testpatch, add custom trace for ring segment alloc for custom debugging only --- drivers/usb/host/xhci-mem.c | 10 ++++++++++ drivers/usb/host/xhci-trace.h | 5 +++++ 2 files changed, 15 insertions(+) diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c index e5ace89..7d343ad 100644 --- a/drivers/usb/host/xhci-mem.c +++ b/drivers/usb/host/xhci-mem.c @@ -44,10 +44,15 @@ static struct xhci_segment *xhci_segment_alloc(struct xhci_hcd *xhci, return NULL; } + xhci_dbg_trace(xhci, trace_xhci_ring_mem_detail, + "MATTU xhci_segment_alloc dma @ %pad", &dma); + if (max_packet) { seg->bounce_buf = kzalloc(max_packet, flags); if (!seg->bounce_buf) { dma_pool_free(xhci->segment_pool, seg->trbs, dma); + xhci_dbg_trace(xhci, trace_xhci_ring_mem_detail, + "MATTU xhci segment free dma @ %pad", &dma); kfree(seg); return NULL; } @@ -58,6 +63,9 @@ static struct xhci_segment *xhci_segment_alloc(struct xhci_hcd *xhci, seg->trbs[i].link.control |= cpu_to_le32(TRB_CYCLE); } seg->dma = dma; + xhci_dbg_trace(xhci, trace_xhci_ring_mem_detail, + "MATTU xhci segment alloc seg->dma @ %pad", &seg->dma); + seg->next = NULL; return seg; @@ -67,6 +75,8 @@ static void xhci_segment_free(struct xhci_hcd *xhci, struct xhci_segment *seg) { if (seg->trbs) { dma_pool_free(xhci->segment_pool, seg->trbs, seg->dma); + xhci_dbg_trace(xhci, trace_xhci_ring_mem_detail, + "MATTU xhci segment free seg->dma @ %p", &seg->dma); seg->trbs = NULL; } kfree(seg->bounce_buf); diff --git a/drivers/usb/host/xhci-trace.h b/drivers/usb/host/xhci-trace.h index 35bdd06..951e371 100644 --- a/drivers/usb/host/xhci-trace.h +++ b/drivers/usb/host/xhci-trace.h @@ -72,6 +72,11 @@ DEFINE_EVENT(xhci_log_msg, xhci_dbg_ring_expansion, TP_ARGS(vaf) ); +DEFINE_EVENT(xhci_log_msg, xhci_ring_mem_detail, + TP_PROTO(struct va_format *vaf), + TP_ARGS(vaf) +); + DECLARE_EVENT_CLASS(xhci_log_ctx, TP_PROTO(struct xhci_hcd *xhci, struct xhci_container_ctx *ctx, unsigned int ep_num), -- 2.7.4