Message ID | 50FF4BC9.1060206@candelatech.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 2013-01-22 at 18:32 -0800, Ben Greear wrote: > diff --git a/net/core/dst.c b/net/core/dst.c > index ee6153e..234b168 100644 > --- a/net/core/dst.c > +++ b/net/core/dst.c > @@ -245,6 +245,7 @@ again: > dst->ops->destroy(dst); > if (dst->dev) > dev_put(dst->dev); > + dst->input = dst->output = 0xdeadbeef; > kmem_cache_free(dst->ops->kmem_cachep, dst); Great ! You could comment the kmem_cache_free() as well to get better chances to hit the bug, and maybe start a bisection to find the faulty commit ? -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/22/2013 10:11 PM, Eric Dumazet wrote: > On Tue, 2013-01-22 at 18:32 -0800, Ben Greear wrote: > >> diff --git a/net/core/dst.c b/net/core/dst.c >> index ee6153e..234b168 100644 >> --- a/net/core/dst.c >> +++ b/net/core/dst.c >> @@ -245,6 +245,7 @@ again: >> dst->ops->destroy(dst); >> if (dst->dev) >> dev_put(dst->dev); >> + dst->input = dst->output = 0xdeadbeef; >> kmem_cache_free(dst->ops->kmem_cachep, dst); > > Great ! > > You could comment the kmem_cache_free() as well to get better chances to > hit the bug, and maybe start a bisection to find the faulty commit ? I suspect the bug goes back at least as far as 3.3. And since I need the NFS patches for this test case, bisecting will be pure hell. I'll work on some more code instrumentation tomorrow. One thing that came to mind while I was looking at the code today: How are the non-ref-counted dst objects used safely? Any chance that tearing down the IP protocol on a device (or deleting a device) could delete a dst that is referenced by an skb (and thus crashes as I see)? Thanks, Ben
On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote: > How are the non-ref-counted dst objects used safely? Any chance > that tearing down the IP protocol on a device (or deleting a device) > could delete a dst that is referenced by an skb (and thus crashes as > I see)? There is probably a bug. Normally it should be RCU protected. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote: > I suspect the bug goes back at least as far as 3.3. And since > I need the NFS patches for this test case, bisecting will be pure hell. Make sure the bug is still present in net-next. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/23/2013 05:35 AM, Eric Dumazet wrote: > On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote: > >> How are the non-ref-counted dst objects used safely? Any chance >> that tearing down the IP protocol on a device (or deleting a device) >> could delete a dst that is referenced by an skb (and thus crashes as >> I see)? > > There is probably a bug. Normally it should be RCU protected. I'm building net-next, and will do some testing on it later today if all goes well. In the meantime, a few questions about RCU. If the non-ref-counted dst is protected by RCU, does that mean that from the time we acquire the pointer (ie, set it in the skb), to the time we clear the pointer from the skb, we must be under RCU read lock? If so, that means that ip_rcv_finish must be called under RCU read lock, etc? Btw, this seems non related to NFS, so I'm dropping them from CC after this email unless I hear a request otherwise. Thanks, Ben
On Wed, 2013-01-23 at 10:15 -0800, Ben Greear wrote: > On 01/23/2013 05:35 AM, Eric Dumazet wrote: > > On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote: > > > >> How are the non-ref-counted dst objects used safely? Any chance > >> that tearing down the IP protocol on a device (or deleting a device) > >> could delete a dst that is referenced by an skb (and thus crashes as > >> I see)? > > > > There is probably a bug. Normally it should be RCU protected. > > I'm building net-next, and will do some testing on it later today > if all goes well. > > In the meantime, a few questions about RCU. > > If the non-ref-counted dst is protected by RCU, does that mean > that from the time we acquire the pointer (ie, set it in the skb), > to the time we clear the pointer from the skb, we must be under > RCU read lock? > > If so, that means that ip_rcv_finish must be called under RCU > read lock, etc? Obviously yes. Check skb_dst_force() for the rare cases we want to escape the RCU section. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/23/2013 06:42 AM, Eric Dumazet wrote: > On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote: > >> I suspect the bug goes back at least as far as 3.3. And since >> I need the NFS patches for this test case, bisecting will be pure hell. > > Make sure the bug is still present in net-next. > My test case isn't running well on net-next. Seems most of the file-io processes are hung trying to open a file: [root@lf1011-12060006 lanforge]# cat /proc/10612/stack [<ffffffff8115fadb>] do_last+0x1db/0xa4e [<ffffffff8116090b>] path_openat+0xcb/0x363 [<ffffffff81160ca8>] do_filp_open+0x38/0x84 [<ffffffff81153bfc>] do_sys_open+0x6d/0xff [<ffffffff81153cbb>] sys_open+0x1c/0x1e [<ffffffff8154f629>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff I'm going back to 3.7.3 and will try to learn some more about what it's doing there. Thanks, Ben
On 01/22/2013 06:32 PM, Ben Greear wrote: So, I'm slowly making some progress. I've verified that the skb has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish method. I'm trying to track it backwards and figure out which device it belongs to, etc....takes a while to reproduce though. One thing about this stack trace below...the dev_seq_stop() does a rcu read-unlock. Now, I can't figure out exactly how ip_rcv() can cause dev_seq_stop() to run, but if this stack is legit, then maybe by the time we enter the ip_rcv_finish() code we are running without rcu_readlock() held? If so, that would probably explain the bug. > Call Trace: > [<ffffffff814a8b02>] ? ip_rcv_finish+0x2f0/0x308 > [<ffffffff814a8812>] ? skb_dst+0x5a/0x5a > [<ffffffff814a8eb5>] NF_HOOK.clone.1+0x4c/0x54 > [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb > [<ffffffff814a9142>] ip_rcv+0x237/0x269 > [<ffffffff81473def>] __netif_receive_skb+0x487/0x530 > [<ffffffff81473f91>] process_backlog+0xf9/0x1da > [<ffffffff8147639a>] net_rx_action+0xad/0x218 > [<ffffffff8108d50a>] __do_softirq+0x9c/0x161 > [<ffffffff8108d5f2>] run_ksoftirqd+0x23/0x42 > [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259 > [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11 > [<ffffffff810a0a6d>] kthread+0xc2/0xca > [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56 > [<ffffffff81537b7c>] ret_from_fork+0x7c/0xb0 > [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56 ## This is from a slightly different kernel image...but probably this part is legit. 0xffffffff814a92b3 is in ip_rcv (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:466). 461 /* Our transport medium may have padded the buffer out. Now we know it 462 * is IP we can trim to the true length of the frame. 463 * Note this now means skb->len holds ntohs(iph->tot_len). 464 */ 465 if (pskb_trim_rcsum(skb, len)) { 466 IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS); 467 goto drop; 468 } 469 470 /* Remove any debris in the socket control block */
On Wed, 2013-01-23 at 15:55 -0800, Ben Greear wrote: > On 01/22/2013 06:32 PM, Ben Greear wrote: > > So, I'm slowly making some progress. I've verified that the skb > has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish > method. I'm trying to track it backwards and figure out which > device it belongs to, etc....takes a while to reproduce though. > > One thing about this stack trace below...the dev_seq_stop() does > a rcu read-unlock. Now, I can't figure out exactly how ip_rcv() > can cause dev_seq_stop() to run, but if this stack is legit, > then maybe by the time we enter the ip_rcv_finish() code we are > running without rcu_readlock() held? > > If so, that would probably explain the bug. > The whole thing is run under rcu_read_lock() done in __netif_receive_skb() My suspicion was that we called netif_rx() from macvlan leaving a not refcounted skb dst. But the patch I sent to you didnt solve the bug, so its something else. You could trace at which point the dst was released. (where you set dst->input/output to deadbeef) > > Call Trace: > > [<ffffffff814a8b02>] ? ip_rcv_finish+0x2f0/0x308 > > [<ffffffff814a8812>] ? skb_dst+0x5a/0x5a > > [<ffffffff814a8eb5>] NF_HOOK.clone.1+0x4c/0x54 > > [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb > > [<ffffffff814a9142>] ip_rcv+0x237/0x269 > > [<ffffffff81473def>] __netif_receive_skb+0x487/0x530 > > [<ffffffff81473f91>] process_backlog+0xf9/0x1da > > [<ffffffff8147639a>] net_rx_action+0xad/0x218 > > [<ffffffff8108d50a>] __do_softirq+0x9c/0x161 > > [<ffffffff8108d5f2>] run_ksoftirqd+0x23/0x42 > > [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259 > > [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11 > > [<ffffffff810a0a6d>] kthread+0xc2/0xca > > [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56 > > [<ffffffff81537b7c>] ret_from_fork+0x7c/0xb0 > > [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56 > > > ## This is from a slightly different kernel image...but probably this part is legit. > > 0xffffffff814a92b3 is in ip_rcv (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:466). > 461 /* Our transport medium may have padded the buffer out. Now we know it > 462 * is IP we can trim to the true length of the frame. > 463 * Note this now means skb->len holds ntohs(iph->tot_len). > 464 */ > 465 if (pskb_trim_rcsum(skb, len)) { > 466 IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS); > 467 goto drop; > 468 } > 469 > 470 /* Remove any debris in the socket control block */ > > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/23/2013 04:01 PM, Eric Dumazet wrote: > On Wed, 2013-01-23 at 15:55 -0800, Ben Greear wrote: >> On 01/22/2013 06:32 PM, Ben Greear wrote: >> >> So, I'm slowly making some progress. I've verified that the skb >> has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish >> method. I'm trying to track it backwards and figure out which >> device it belongs to, etc....takes a while to reproduce though. >> >> One thing about this stack trace below...the dev_seq_stop() does >> a rcu read-unlock. Now, I can't figure out exactly how ip_rcv() >> can cause dev_seq_stop() to run, but if this stack is legit, >> then maybe by the time we enter the ip_rcv_finish() code we are >> running without rcu_readlock() held? >> >> If so, that would probably explain the bug. >> > > The whole thing is run under rcu_read_lock() done in > __netif_receive_skb() I was worried that the dev_seq_stop might be called incorrectly causing an asymetric unlock. I have no idea how that might happened, but several crashes have that dev_seq_stop method listed, so it got me suspicious. > > My suspicion was that we called netif_rx() from macvlan leaving a > not refcounted skb dst. > > But the patch I sent to you didnt solve the bug, so its something else. > > You could trace at which point the dst was released. (where you set > dst->input/output to deadbeef) My current code is in some garbage collector timer code, but I can work on saving the call-site that first pokes the dst into the garbage collection list... Thanks, Ben
On Wed, 2013-01-23 at 16:13 -0800, Ben Greear wrote: > On 01/23/2013 04:01 PM, Eric Dumazet wrote: > > On Wed, 2013-01-23 at 15:55 -0800, Ben Greear wrote: > >> On 01/22/2013 06:32 PM, Ben Greear wrote: > >> > >> So, I'm slowly making some progress. I've verified that the skb > >> has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish > >> method. I'm trying to track it backwards and figure out which > >> device it belongs to, etc....takes a while to reproduce though. > >> > >> One thing about this stack trace below...the dev_seq_stop() does > >> a rcu read-unlock. Now, I can't figure out exactly how ip_rcv() > >> can cause dev_seq_stop() to run, but if this stack is legit, > >> then maybe by the time we enter the ip_rcv_finish() code we are > >> running without rcu_readlock() held? > >> > >> If so, that would probably explain the bug. > >> > > > > The whole thing is run under rcu_read_lock() done in > > __netif_receive_skb() > > I was worried that the dev_seq_stop might be called > incorrectly causing an asymetric unlock. I have no > idea how that might happened, but several crashes > have that dev_seq_stop method listed, so it got me suspicious. dev_seq_stop() is some word in the kernel stack, result of a prior system call. Stack is not cleanup. Each function reserves an amount of stack but not always write on all reserved space (some automatic variables might be not set) Note the "? " before the name : linux printed the symbol but this was not a call site for this particular call graph. Its only an extra indication, that can be useful sometimes. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/dst.c b/net/core/dst.c index ee6153e..234b168 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -245,6 +245,7 @@ again: dst->ops->destroy(dst); if (dst->dev) dev_put(dst->dev); + dst->input = dst->output = 0xdeadbeef; kmem_cache_free(dst->ops->kmem_cachep, dst); dst = child;