Message ID | 20231213213735.434249-2-thinker.li@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Fix dangling pointer at f6i->gc_link. | expand |
On 12/13/23 2:37 PM, thinker.li@gmail.com wrote: > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index b132feae3393..dcaeb88d73aa 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -3763,10 +3763,10 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, > rt->dst_nocount = true; > > if (cfg->fc_flags & RTF_EXPIRES) > - fib6_set_expires_locked(rt, jiffies + > - clock_t_to_jiffies(cfg->fc_expires)); > + __fib6_set_expires(rt, jiffies + > + clock_t_to_jiffies(cfg->fc_expires)); > else > - fib6_clean_expires_locked(rt); > + __fib6_clean_expires(rt); as Eric noted in a past comment, the clean is not needed in this function since memory is initialized to 0 (expires is never set). Also, this patch set does not fundamentally change the logic, so it cannot fix the bug reported in https://lore.kernel.org/all/20231205173250.2982846-1-edumazet@google.com/ please hold off future versions of this set until the problem in that stack traced is fixed. I have tried a few things using RA's, but have not been able to recreate UAF.
On 12/13/23 22:11, David Ahern wrote: > On 12/13/23 2:37 PM, thinker.li@gmail.com wrote: >> diff --git a/net/ipv6/route.c b/net/ipv6/route.c >> index b132feae3393..dcaeb88d73aa 100644 >> --- a/net/ipv6/route.c >> +++ b/net/ipv6/route.c >> @@ -3763,10 +3763,10 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, >> rt->dst_nocount = true; >> >> if (cfg->fc_flags & RTF_EXPIRES) >> - fib6_set_expires_locked(rt, jiffies + >> - clock_t_to_jiffies(cfg->fc_expires)); >> + __fib6_set_expires(rt, jiffies + >> + clock_t_to_jiffies(cfg->fc_expires)); >> else >> - fib6_clean_expires_locked(rt); >> + __fib6_clean_expires(rt); > > as Eric noted in a past comment, the clean is not needed in this > function since memory is initialized to 0 (expires is never set). > > Also, this patch set does not fundamentally change the logic, so it > cannot fix the bug reported in > > https://lore.kernel.org/all/20231205173250.2982846-1-edumazet@google.com/ > > please hold off future versions of this set until the problem in that > stack traced is fixed. I have tried a few things using RA's, but have > not been able to recreate UAF. Sure
On 12/13/23 22:11, David Ahern wrote: > On 12/13/23 2:37 PM, thinker.li@gmail.com wrote: >> diff --git a/net/ipv6/route.c b/net/ipv6/route.c >> index b132feae3393..dcaeb88d73aa 100644 >> --- a/net/ipv6/route.c >> +++ b/net/ipv6/route.c >> @@ -3763,10 +3763,10 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, >> rt->dst_nocount = true; >> >> if (cfg->fc_flags & RTF_EXPIRES) >> - fib6_set_expires_locked(rt, jiffies + >> - clock_t_to_jiffies(cfg->fc_expires)); >> + __fib6_set_expires(rt, jiffies + >> + clock_t_to_jiffies(cfg->fc_expires)); >> else >> - fib6_clean_expires_locked(rt); >> + __fib6_clean_expires(rt); > > as Eric noted in a past comment, the clean is not needed in this > function since memory is initialized to 0 (expires is never set). > > Also, this patch set does not fundamentally change the logic, so it > cannot fix the bug reported in > > https://lore.kernel.org/all/20231205173250.2982846-1-edumazet@google.com/ > > please hold off future versions of this set until the problem in that > stack traced is fixed. I have tried a few things using RA's, but have > not been able to recreate UAF. I tried to reproduce the issue yesterday, according to the hypothesis behind the patch of this thread. The following is the instructions to reproduce the UAF issue. However, this instruction doesn't create a crash at the place since the memory is still available even it has been free. But, the log shows a UAF. The patch at the end of this message is required to reproduce and show UAF. The most critical change in the patch is to insert a 'mdelay(3000)' to sleep 3s in rt6_route_rcv(). That gives us a chance to manipulate the kernel to reproduce the UAF. Here is my conclusion. There is no protection between finding a route and changing the route in rt6_route_rcv(), including inserting the entry to the gc list. It is possible to insert an entry that will be free later to the gc list, causing a UAF. There is more explanations along with the following logs. Instructions: - Preparation - install ipv6toolkit on the host. - run qemu with a patched kernel as a guest through with the host through qemubr0 a bridge. - On the guest - stop systemd-networkd.service & systemd-networkd.socket if there are. - sysctl -w net.ipv6.conf.enp0s3.accept_ra=2 - sysctl -w net.ipv6.conf.enp0s3.accept_ra_rt_info_max_plen=127 - Assume the address of qemubr0 in the host is fe80::4ce9:92ff:fe27:75df. - Test - ra6 -i qemubr0 -d ff02::1 -R 'fe82::/64#1#300' -t 300 # On the host - sleep 2; ip -6 route del fe82::/64 # On the guest - ra6 -i qemubr0 -d ff02::1 -R 'fe82::/64#1#300' -t 300 # On the host - ra6 -i qemubr0 -d ff02::1 -R 'fe81::/64#1#300' -t 0 # On the host - The step 3 should start immediately after step 2 since we have a gap of merely 3 seconds in the kernel. The log generated by the test: # this is the log triggered by step 1 qemu login: [ 4.673867] __ip6_ins_rt fe80::5054:ff:fe12:3456/128 [ 82.138139] rt = 0000000000000000 [ 82.138731] fib6_info_alloc: ffff888103950200 [ 82.139088] __ip6_ins_rt ::/0 [ 82.139993] fib6_add: add ffff888103950200 to gc list ffff888103950238 pprev ffff888102878200 next 0000000000000000 [ 82.141719] rt = ffff888103950200 [ 82.141748] rt6_route_rcv [ 82.141748] fib6_info_alloc: ffff88810093be00 [ 82.141748] __ip6_ins_rt fe82::/64 [ 82.141748] rt6_route_rcv: route info ffff88810093be00, sleep 3s [ 85.121803] fib6_set_expires_locked: add ffff88810093be00 to gc list ffff88810093be38 pprev ffff888102878200 next ffff888103950238 [ 85.146497] rt6_route_rcv: route info ffff88810093be00, after release # This is the log triggered by step 2 & 3. # # The line containing fib6_clean_expires_locked is specifically # triggered by step 2. Step 2 removes the entry from the tree and # gc list. Step 3 free the entry by calling # fib6_info_release() at the very end of rt6_route_rcv() since it is # the last owner of that entry. fib6_info_destroy_rcu proves that. # # However, even the entry will be free later, rt6_route_rcv() still add # the entry back to the gc list before freeing the entry. In other # words, it create an entry (ffff88810093be38) that is free in # a gc list. # # Keep an eye on ffff88810093be38 and ffff88810093be00. ffff88810093be00 # is the address of a fib6_info and ffff88810093be38 is the address of # its gc_link. # # This log also complies with the log in # # https://lore.kernel.org/all/20231205173250.2982846-1-edumazet@google.com/ # # The entry is free by the call of fib6_info_release() in # rt6_route_rcv(). [ 105.158140] rt = ffff888103950200 [ 105.158590] rt6_route_rcv [ 105.158924] rt6_route_rcv: route info ffff88810093be00, sleep 3s [ 106.368875] fib6_clean_expires_locked: del ffff88810093be00 from gc list pprev ffff888102878200 next ffff888103950238 [ 107.201815] fib6_set_expires_locked: add ffff88810093be00 to gc list ffff88810093be38 pprev ffff888102878200 next ffff888103950238 [ 108.159815] rt6_route_rcv: route info ffff88810093be00, after release [ 108.168807] fib6_info_destroy_rcu ffff88810093be00 # This is the log triggered by step 4. # The line containing fib6_clean_expires_locked shows the free entry # mentioned in the previous part is still in the gc list. (pprev # ffff88810093be38) # Since fib6_clean_expires_locked() calls hlist_del_init() to remove # an entry from the gc list, it will change the value of *pprev # (ffff88810093be38). It causes an UAF case. [ 131.882130] rt = ffff888103950200 [ 131.882567] __ip6_del_rt ::/0 [ 131.882932] fib6_clean_expires_locked: del ffff888103950200 from gc list pprev ffff88810093be38 next 0000000000000000 [ 131.883296] rt6_route_rcv [ 131.883296] fib6_info_alloc: ffff88810093be00 [ 131.884517] __ip6_ins_rt fe81::/64 [ 131.884517] rt6_route_rcv: route info ffff88810093be00, sleep 3s [ 134.305866] fib6_set_expires_locked: add ffff88810093be00 to gc list ffff88810093be38 pprev ffff888102878200 next ffff88810093be38 [ 134.537866] rt6_route_rcv: route info ffff88810093be00, after release [ 134.896801] fib6_info_destroy_rcu ffff888103950200 # The following log is the kernel errors that printed after 10s seconds. [ 168.321812] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [swapper/3:0] [ 196.289823] watchdog: BUG: soft lockup - CPU#3 stuck for 52s! [swapper/3:0] [ 214.244784] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 214.245723] rcu: 3-....: (7 ticks this GP) idle=198c/0/0x1 softirq=2002/2003 fqs=12309 [ 214.245774] rcu: (detected by 2, t=60002 jiffies, g=1213, q=282 ncpus=4) [ 240.321823] watchdog: BUG: soft lockup - CPU#3 stuck for 93s! [swapper/3:0] [ 268.353804] watchdog: BUG: soft lockup - CPU#3 stuck for 119s! [swapper/3:0] [ 296.388805] watchdog: BUG: soft lockup - CPU#3 stuck for 146s! [swapper/3:0] diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index 1ba9f4ddf2f6..6e059ba3d2d0 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -510,8 +510,11 @@ static inline void fib6_set_expires_locked(struct fib6_info *f6i, tb6 = f6i->fib6_table; f6i->expires = expires; - if (tb6 && !fib6_has_expires(f6i)) + if (tb6 && !fib6_has_expires(f6i)) { hlist_add_head(&f6i->gc_link, &tb6->tb6_gc_hlist); + printk(KERN_CRIT "fib6_set_expires_locked: add %px to gc list %px pprev %px next %px\n", + f6i, &f6i->gc_link, f6i->gc_link.pprev, f6i->gc_link.next); + } f6i->fib6_flags |= RTF_EXPIRES; } @@ -529,8 +532,11 @@ static inline void fib6_set_expires(struct fib6_info *f6i, static inline void fib6_clean_expires_locked(struct fib6_info *f6i) { - if (fib6_has_expires(f6i)) + if (fib6_has_expires(f6i)) { + printk(KERN_CRIT "fib6_clean_expires_locked: del %px from gc list pprev %px next %px\n", + f6i, f6i->gc_link.pprev, f6i->gc_link.next); hlist_del_init(&f6i->gc_link); + } f6i->fib6_flags &= ~RTF_EXPIRES; f6i->expires = 0; } diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 28b01a068412..b275b9798b5e 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -162,6 +162,8 @@ struct fib6_info *fib6_info_alloc(gfp_t gfp_flags, bool with_fib6_nh) INIT_HLIST_NODE(&f6i->gc_link); + printk(KERN_CRIT "fib6_info_alloc: %px\n", f6i); + return f6i; } @@ -178,6 +180,7 @@ void fib6_info_destroy_rcu(struct rcu_head *head) ip_fib_metrics_put(f6i->fib6_metrics); kfree(f6i); + printk(KERN_CRIT "fib6_info_destroy_rcu %px\n", f6i); } EXPORT_SYMBOL_GPL(fib6_info_destroy_rcu); @@ -1486,8 +1489,10 @@ int fib6_add(struct fib6_node *root, struct fib6_info *rt, list_add(&rt->nh_list, &rt->nh->f6i_list); __fib6_update_sernum_upto_root(rt, fib6_new_sernum(info->nl_net)); - if (fib6_has_expires(rt)) + if (fib6_has_expires(rt)) { hlist_add_head(&rt->gc_link, &table->tb6_gc_hlist); + printk(KERN_CRIT "fib6_add: add %px to gc list %px pprev %px next %px\n", rt, &rt->gc_link, rt->gc_link.pprev, rt->gc_link.next); + } fib6_start_gc(info->nl_net, rt); } diff --git a/net/ipv6/route.c b/net/ipv6/route.c index b132feae3393..52283a80f79e 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -935,6 +935,8 @@ int rt6_route_rcv(struct net_device *dev, u8 *opt, int len, unsigned long lifetime; struct fib6_info *rt; + printk(KERN_CRIT "rt6_route_rcv\n"); + if (len < sizeof(struct route_info)) { return -EINVAL; } @@ -989,12 +991,15 @@ int rt6_route_rcv(struct net_device *dev, u8 *opt, int len, (rt->fib6_flags & ~RTF_PREF_MASK) | RTF_PREF(pref); if (rt) { + printk(KERN_CRIT "rt6_route_rcv: route info %px, sleep 3s\n", rt); + mdelay(3000); if (!addrconf_finite_timeout(lifetime)) fib6_clean_expires(rt); else fib6_set_expires(rt, jiffies + HZ * lifetime); fib6_info_release(rt); + printk(KERN_CRIT "rt6_route_rcv: route info %px, after release\n", rt); } return 0; } @@ -1297,6 +1302,9 @@ static int __ip6_ins_rt(struct fib6_info *rt, struct nl_info *info, { int err; struct fib6_table *table; + if (rt) + printk(KERN_CRIT "__ip6_ins_rt %pI6c/%d\n", + &rt->fib6_dst.addr, rt->fib6_dst.plen); table = rt->fib6_table; spin_lock_bh(&table->tb6_lock); @@ -3855,6 +3863,9 @@ static int __ip6_del_rt(struct fib6_info *rt, struct nl_info *info) struct net *net = info->nl_net; struct fib6_table *table; int err; + if (rt) + printk(KERN_CRIT "__ip6_del_rt %pI6c/%d\n", + &rt->fib6_dst.addr, rt->fib6_dst.plen); if (rt == net->ipv6.fib6_null_entry) { err = -ENOENT; @@ -4345,8 +4356,10 @@ struct fib6_info *rt6_get_dflt_router(struct net *net, ipv6_addr_equal(&nh->fib_nh_gw6, addr)) break; } - if (rt && !fib6_info_hold_safe(rt)) + if (rt && !fib6_info_hold_safe(rt)) { rt = NULL; + } + printk(KERN_CRIT "rt = %px\n", rt); rcu_read_unlock(); return rt; }
On 12/15/23 11:12 AM, Kui-Feng Lee wrote: > > I tried to reproduce the issue yesterday, according to the hypothesis > behind the patch of this thread. The following is the instructions > to reproduce the UAF issue. However, this instruction doesn't create > a crash at the place since the memory is still available even it has > been free. But, the log shows a UAF. > > The patch at the end of this message is required to reproduce and > show UAF. The most critical change in the patch is to insert > a 'mdelay(3000)' to sleep 3s in rt6_route_rcv(). That gives us > a chance to manipulate the kernel to reproduce the UAF. > > Here is my conclusion. There is no protection between finding > a route and changing the route in rt6_route_rcv(), including inserting > the entry to the gc list. It is possible to insert an entry that will be > free later to the gc list, causing a UAF. There is more explanations > along with the following logs. TL;DR: I think 3dec89b14d37 should be reverted for 6.6 and 6.7 (selftests should be valid with or without this change) and you try again for 6.9 (6.7 dev cycle is about to close). ### I was successful in triggering UAF twice yesterday, but a slightly different code path that in Eric's trace: This is the WARN_ON for !hlist_unhashed in fib6_info_release: [ 46.926339] ------------[ cut here ]------------ [ 46.926368] WARNING: CPU: 3 PID: 0 at include/net/ip6_fib.h:332 fib6_info_release+0x2a/0x43 [ 46.926384] Modules linked in: [ 46.926393] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.7.0-rc4-debug+ #16 [ 46.926399] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 [ 46.926404] RIP: 0010:fib6_info_release+0x2a/0x43 [ 46.926409] Code: 48 85 ff 74 3d 55 48 89 e5 53 48 89 fb 48 8d 7f 2c e8 3e ff ff ff 84 c0 74 25 48 8d 7b 40 e8 33 11 8b ff 48 83 7b 40 00 74 02 <0f> 0b 48 8d bb a0 00 00 00 48 c7 c6 93 ea ae 81 e8 3f 00 66 ff 5b [ 46.926416] RSP: 0018:ffffc900002a8390 EFLAGS: 00010282 [ 46.926422] RAX: 1ffff11002048b00 RBX: ffff888010245c00 RCX: ffffffff81af8c9e [ 46.926426] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: ffff888010245c40 [ 46.926431] RBP: ffffc900002a8398 R08: ffffed1002048b86 R09: 0000000000000001 [ 46.926435] R10: ffffffff81af8be4 R11: ffff888010245c2f R12: ffff88800aeec000 [ 46.926439] R13: 0000000000000000 R14: ffff88801ab7c200 R15: 0000000000000020 [ 46.926444] FS: 0000000000000000(0000) GS:ffff88806c000000(0000) knlGS:0000000000000000 [ 46.926448] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 46.926452] CR2: 00007f897cc95fb0 CR3: 000000001d69d000 CR4: 0000000000750ef0 [ 46.926458] PKRU: 55555554 [ 46.926462] Call Trace: [ 46.926466] <IRQ> [ 46.926470] ? show_regs+0x5c/0x60 [ 46.926478] ? fib6_info_release+0x2a/0x43 [ 46.926483] ? __warn+0xcb/0x19c [ 46.926489] ? fib6_info_release+0x2a/0x43 [ 46.926495] ? report_bug+0x114/0x186 [ 46.926504] ? handle_bug+0x40/0x6b [ 46.926510] ? exc_invalid_op+0x19/0x41 [ 46.926515] ? asm_exc_invalid_op+0x1b/0x20 [ 46.926524] ? refcount_dec_and_test+0x15/0x43 [ 46.926530] ? fib6_info_release+0x23/0x43 [ 46.926536] ? fib6_info_release+0x2a/0x43 [ 46.926542] ndisc_router_discovery+0xf41/0xfa6 UAF here: [ 47.941928] ================================================================== [ 47.942548] BUG: KASAN: slab-use-after-free in fib6_gc_all.constprop.0+0x19b/0x294 [ 47.943178] Read of size 8 at addr ffff888010245c38 by task swapper/3/0 [ 47.943866] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G W 6.7.0-rc4-debug+ #16 [ 47.944548] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 [ 47.945204] Call Trace: [ 47.945416] <IRQ> [ 47.945595] dump_stack_lvl+0x5b/0x82 [ 47.945912] print_address_description.constprop.0+0x7a/0x2eb [ 47.946391] print_report+0x106/0x1e0 [ 47.946703] ? virt_to_slab+0x9/0x1a [ 47.947007] ? kmem_cache_debug_flags+0x13/0x1f [ 47.947392] ? kasan_complete_mode_report_info+0x19e/0x1a1 [ 47.947850] ? fib6_gc_all.constprop.0+0x19b/0x294 [ 47.948254] kasan_report+0x99/0xc4 [ 47.948554] ? fib6_gc_all.constprop.0+0x19b/0x294 [ 47.948962] __asan_load8+0x77/0x79 [ 47.949262] fib6_gc_all.constprop.0+0x19b/0x294 for a route added here: [ 47.970092] Allocated by task 0: [ 47.970366] stack_trace_save+0x8d/0xbb [ 47.970373] kasan_save_stack+0x26/0x46 [ 47.970379] kasan_set_track+0x25/0x2b [ 47.970385] kasan_save_alloc_info+0x1e/0x21 [ 47.970392] ____kasan_kmalloc+0x6f/0x7b [ 47.970398] __kasan_kmalloc+0x9/0xb [ 47.970404] __kmalloc+0x91/0xbf [ 47.970411] kzalloc+0xf/0x11 [ 47.970416] fib6_info_alloc+0x26/0xa1 [ 47.970422] ip6_route_info_create+0x266/0x6c5 [ 47.970428] ip6_route_add+0x14/0x46 [ 47.970433] rt6_add_dflt_router+0x123/0x1bd [ 47.970439] ndisc_router_discovery+0x5a4/0xfa6 [ 47.970447] ndisc_rcv+0x1a2/0x1b5 This is just aggressive RA's and aggressive gc with a hack to the stack to toggle lifetime or metric on every RA (something that can be scripted with an ra command - set and reset lifetime in every other RA and change the metric in every RA). I was targeting this code path because I noticed rt6_add_dflt_router sets RTF_EXPIRES but does not pass in the expires value. There are races (like the one you found with a different code path) in the handling of RTF_EXPIRES, setting the timer, running gc and adding the route entry to the gc list. In short there are a number of places in the RA path that need the lifetime handling cleaned up first to make it all consistent with the idea of using a linked list to track entries with an expires tag.
On 12/16/23 10:36, David Ahern wrote: > On 12/15/23 11:12 AM, Kui-Feng Lee wrote: >> >> I tried to reproduce the issue yesterday, according to the hypothesis >> behind the patch of this thread. The following is the instructions >> to reproduce the UAF issue. However, this instruction doesn't create >> a crash at the place since the memory is still available even it has >> been free. But, the log shows a UAF. >> >> The patch at the end of this message is required to reproduce and >> show UAF. The most critical change in the patch is to insert >> a 'mdelay(3000)' to sleep 3s in rt6_route_rcv(). That gives us >> a chance to manipulate the kernel to reproduce the UAF. >> >> Here is my conclusion. There is no protection between finding >> a route and changing the route in rt6_route_rcv(), including inserting >> the entry to the gc list. It is possible to insert an entry that will be >> free later to the gc list, causing a UAF. There is more explanations >> along with the following logs. > > TL;DR: I think 3dec89b14d37 should be reverted for 6.6 and 6.7 > (selftests should be valid with or without this change) and you try > again for 6.9 (6.7 dev cycle is about to close). > > ### > > I was successful in triggering UAF twice yesterday, but a slightly > different code path that in Eric's trace: > > This is the WARN_ON for !hlist_unhashed in fib6_info_release: > > [ 46.926339] ------------[ cut here ]------------ > [ 46.926368] WARNING: CPU: 3 PID: 0 at include/net/ip6_fib.h:332 > fib6_info_release+0x2a/0x43 > [ 46.926384] Modules linked in: > [ 46.926393] CPU: 3 PID: 0 Comm: swapper/3 Not tainted > 6.7.0-rc4-debug+ #16 > [ 46.926399] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > 1.15.0-1 04/01/2014 > [ 46.926404] RIP: 0010:fib6_info_release+0x2a/0x43 > [ 46.926409] Code: 48 85 ff 74 3d 55 48 89 e5 53 48 89 fb 48 8d 7f 2c > e8 3e ff ff ff 84 c0 74 25 48 8d 7b 40 e8 33 11 8b ff 48 83 7b 40 00 74 > 02 <0f> 0b 48 8d bb a0 00 00 00 48 c7 c6 93 ea ae 81 e8 3f 00 66 ff 5b > [ 46.926416] RSP: 0018:ffffc900002a8390 EFLAGS: 00010282 > [ 46.926422] RAX: 1ffff11002048b00 RBX: ffff888010245c00 RCX: > ffffffff81af8c9e > [ 46.926426] RDX: dffffc0000000000 RSI: 0000000000000004 RDI: > ffff888010245c40 > [ 46.926431] RBP: ffffc900002a8398 R08: ffffed1002048b86 R09: > 0000000000000001 > [ 46.926435] R10: ffffffff81af8be4 R11: ffff888010245c2f R12: > ffff88800aeec000 > [ 46.926439] R13: 0000000000000000 R14: ffff88801ab7c200 R15: > 0000000000000020 > [ 46.926444] FS: 0000000000000000(0000) GS:ffff88806c000000(0000) > knlGS:0000000000000000 > [ 46.926448] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 46.926452] CR2: 00007f897cc95fb0 CR3: 000000001d69d000 CR4: > 0000000000750ef0 > [ 46.926458] PKRU: 55555554 > [ 46.926462] Call Trace: > [ 46.926466] <IRQ> > [ 46.926470] ? show_regs+0x5c/0x60 > [ 46.926478] ? fib6_info_release+0x2a/0x43 > [ 46.926483] ? __warn+0xcb/0x19c > [ 46.926489] ? fib6_info_release+0x2a/0x43 > [ 46.926495] ? report_bug+0x114/0x186 > [ 46.926504] ? handle_bug+0x40/0x6b > [ 46.926510] ? exc_invalid_op+0x19/0x41 > [ 46.926515] ? asm_exc_invalid_op+0x1b/0x20 > [ 46.926524] ? refcount_dec_and_test+0x15/0x43 > [ 46.926530] ? fib6_info_release+0x23/0x43 > [ 46.926536] ? fib6_info_release+0x2a/0x43 > [ 46.926542] ndisc_router_discovery+0xf41/0xfa6 > > > UAF here: > > [ 47.941928] > ================================================================== > [ 47.942548] BUG: KASAN: slab-use-after-free in > fib6_gc_all.constprop.0+0x19b/0x294 > [ 47.943178] Read of size 8 at addr ffff888010245c38 by task swapper/3/0 > > [ 47.943866] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G W > 6.7.0-rc4-debug+ #16 > [ 47.944548] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > 1.15.0-1 04/01/2014 > [ 47.945204] Call Trace: > [ 47.945416] <IRQ> > [ 47.945595] dump_stack_lvl+0x5b/0x82 > [ 47.945912] print_address_description.constprop.0+0x7a/0x2eb > [ 47.946391] print_report+0x106/0x1e0 > [ 47.946703] ? virt_to_slab+0x9/0x1a > [ 47.947007] ? kmem_cache_debug_flags+0x13/0x1f > [ 47.947392] ? kasan_complete_mode_report_info+0x19e/0x1a1 > [ 47.947850] ? fib6_gc_all.constprop.0+0x19b/0x294 > [ 47.948254] kasan_report+0x99/0xc4 > [ 47.948554] ? fib6_gc_all.constprop.0+0x19b/0x294 > [ 47.948962] __asan_load8+0x77/0x79 > [ 47.949262] fib6_gc_all.constprop.0+0x19b/0x294 > > > for a route added here: > > [ 47.970092] Allocated by task 0: > [ 47.970366] stack_trace_save+0x8d/0xbb > [ 47.970373] kasan_save_stack+0x26/0x46 > [ 47.970379] kasan_set_track+0x25/0x2b > [ 47.970385] kasan_save_alloc_info+0x1e/0x21 > [ 47.970392] ____kasan_kmalloc+0x6f/0x7b > [ 47.970398] __kasan_kmalloc+0x9/0xb > [ 47.970404] __kmalloc+0x91/0xbf > [ 47.970411] kzalloc+0xf/0x11 > [ 47.970416] fib6_info_alloc+0x26/0xa1 > [ 47.970422] ip6_route_info_create+0x266/0x6c5 > [ 47.970428] ip6_route_add+0x14/0x46 > [ 47.970433] rt6_add_dflt_router+0x123/0x1bd > [ 47.970439] ndisc_router_discovery+0x5a4/0xfa6 > [ 47.970447] ndisc_rcv+0x1a2/0x1b5 > > This is just aggressive RA's and aggressive gc with a hack to the stack > to toggle lifetime or metric on every RA (something that can be scripted > with an ra command - set and reset lifetime in every other RA and change > the metric in every RA). > > I was targeting this code path because I noticed rt6_add_dflt_router > sets RTF_EXPIRES but does not pass in the expires value. There are races > (like the one you found with a different code path) in the handling of > RTF_EXPIRES, setting the timer, running gc and adding the route entry to > the gc list. I am not sure what you mean a race in rt6_add_dflt_router(). In this function, it calls ip6_route_add() to add a default route with RTF_EXPIRES. In ip6_route_add(), it calls ip6_route_info_create() to create a f6i, and calls __ip6_ins_rt() to add the entry to gc list. Although there is a gap between ip6_route_info_create() and __ip6_ins_rt() without any protection, it should not cause a race. The newly created entry is not available to the rest of the system until __ip6_ins_rt() adds it to the tree. And, adding the entry to the gc list and adding the entry to the tree are performed atomically, being protected by table->tb6_lock. So, it should not have a race between adding to tree and adding to gc list if this is what you mean. By the way, do you happen to have a chance to try the patch here in you tests? It fixed the issue for my test scenario. According to your stacktraces, it should be the same issue happening in my test. Somewhere adds a entry to gc list by faultily believing that the entry is still on a tree. And, the patch here fixes this faultily believe. > > In short there are a number of places in the RA path that need the > lifetime handling cleaned up first to make it all consistent with the > idea of using a linked list to track entries with an expires tag.
On 12/15/23 11:12, Kui-Feng Lee wrote: > > > On 12/13/23 22:11, David Ahern wrote: >> On 12/13/23 2:37 PM, thinker.li@gmail.com wrote: >>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c >>> index b132feae3393..dcaeb88d73aa 100644 >>> --- a/net/ipv6/route.c >>> +++ b/net/ipv6/route.c >>> @@ -3763,10 +3763,10 @@ static struct fib6_info >>> *ip6_route_info_create(struct fib6_config *cfg, >>> rt->dst_nocount = true; >>> if (cfg->fc_flags & RTF_EXPIRES) >>> - fib6_set_expires_locked(rt, jiffies + >>> - clock_t_to_jiffies(cfg->fc_expires)); >>> + __fib6_set_expires(rt, jiffies + >>> + clock_t_to_jiffies(cfg->fc_expires)); >>> else >>> - fib6_clean_expires_locked(rt); >>> + __fib6_clean_expires(rt); >> >> as Eric noted in a past comment, the clean is not needed in this >> function since memory is initialized to 0 (expires is never set). >> >> Also, this patch set does not fundamentally change the logic, so it >> cannot fix the bug reported in >> >> https://lore.kernel.org/all/20231205173250.2982846-1-edumazet@google.com/ >> >> please hold off future versions of this set until the problem in that >> stack traced is fixed. I have tried a few things using RA's, but have >> not been able to recreate UAF. > > I tried to reproduce the issue yesterday, according to the hypothesis > behind the patch of this thread. The following is the instructions > to reproduce the UAF issue. However, this instruction doesn't create > a crash at the place since the memory is still available even it has > been free. But, the log shows a UAF. > > The patch at the end of this message is required to reproduce and > show UAF. The most critical change in the patch is to insert > a 'mdelay(3000)' to sleep 3s in rt6_route_rcv(). That gives us > a chance to manipulate the kernel to reproduce the UAF. > > Here is my conclusion. There is no protection between finding > a route and changing the route in rt6_route_rcv(), including inserting > the entry to the gc list. It is possible to insert an entry that will be > free later to the gc list, causing a UAF. There is more explanations > along with the following logs. > > Instructions: > - Preparation > - install ipv6toolkit on the host. > - run qemu with a patched kernel as a guest through > with the host through qemubr0 a bridge. > - On the guest > - stop systemd-networkd.service & systemd-networkd.socket if > there are. > - sysctl -w net.ipv6.conf.enp0s3.accept_ra=2 > - sysctl -w net.ipv6.conf.enp0s3.accept_ra_rt_info_max_plen=127 > - Assume the address of qemubr0 in the host is > fe80::4ce9:92ff:fe27:75df. > - Test > - ra6 -i qemubr0 -d ff02::1 -R 'fe82::/64#1#300' -t 300 # On the > host > - sleep 2; ip -6 route del fe82::/64 # On the > guest > - ra6 -i qemubr0 -d ff02::1 -R 'fe82::/64#1#300' -t 300 # On the > host > - ra6 -i qemubr0 -d ff02::1 -R 'fe81::/64#1#300' -t 0 # On the > host > - The step 3 should start immediately after step 2 since we have > a gap of merely 3 seconds in the kernel. > I did the same test with the fix patch along with this thread. The following is the log I got. # This is the log printed by the step 1. # ffff888108066600 was add to the gc list. [ 4.596875] __ip6_ins_rt fe80::5054:ff:fe12:3456/128 [ 38.925482] rt = 0000000000000000 [ 38.925916] fib6_info_alloc: ffff888108066e00 [ 38.926441] __ip6_ins_rt ::/0 [ 38.926441] fib6_add: add ffff888108066e00 to gc list ffff888108066e38 pprev ffff888102a02800 next 0000000000000000 [ 38.927084] rt = ffff888108066e00 [ 38.927084] rt6_route_rcv [ 38.927084] fib6_info_alloc: ffff888108066600 [ 38.929333] __ip6_ins_rt fe82::/64 [ 38.929333] rt6_route_rcv: route info ffff888108066600, sleep 3s [ 40.948831] fib6_set_expires_locked: add ffff888108066600 to gc list ffff888108066638 pprev ffff888102a02800 next ffff888108066e38 [ 41.932501] rt6_route_rcv: route info ffff888108066600, after release # This is the log printed by the step 2 & 3. # The entry (ffff888108066600) removed from the gc list by # fib6_clean_expires_locked was not added back to the gc list again. [ 68.173441] rt = ffff888108066e00 [ 68.173883] rt6_route_rcv [ 68.174245] rt6_route_rcv: route info ffff888108066600, sleep 3s [ 69.389112] fib6_clean_expires_locked: del ffff888108066600 from gc list pprev ffff888102a02800 next ffff888108066e38 [ 70.900831] rt6_route_rcv: route info ffff888108066600, after release [ 71.182822] fib6_info_destroy_rcu ffff888108066600 # This is the log printed by the step 4. [ 109.017446] rt = ffff888108066e00 [ 109.017893] __ip6_del_rt ::/0 [ 109.018288] fib6_clean_expires_locked: del ffff888108066e00 from gc list pprev ffff888102a02800 next 0000000000000000 [ 109.019471] rt6_route_rcv [ 109.019471] fib6_info_alloc: ffff888108066600 [ 109.019471] __ip6_ins_rt fe81::/64 [ 109.019471] rt6_route_rcv: route info ffff888108066600, sleep 3s [ 111.028831] fib6_set_expires_locked: add ffff888108066600 to gc list ffff888108066638 pprev ffff888102a02800 next 0000000000000000 [ 112.023607] rt6_route_rcv: route info ffff888108066600, after release [ 112.031825] fib6_info_destroy_rcu ffff888108066e00
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h index 1ba9f4ddf2f6..1213722c394f 100644 --- a/include/net/ip6_fib.h +++ b/include/net/ip6_fib.h @@ -500,21 +500,47 @@ void fib6_gc_cleanup(void); int fib6_init(void); -/* fib6_info must be locked by the caller, and fib6_info->fib6_table can be - * NULL. - */ -static inline void fib6_set_expires_locked(struct fib6_info *f6i, - unsigned long expires) +static inline void fib6_add_gc_list(struct fib6_info *f6i) { struct fib6_table *tb6; tb6 = f6i->fib6_table; - f6i->expires = expires; - if (tb6 && !fib6_has_expires(f6i)) + if (tb6 && + rcu_dereference_protected(f6i->fib6_node, + lockdep_is_held(&tb6->tb6_lock)) && + hlist_unhashed(&f6i->gc_link)) hlist_add_head(&f6i->gc_link, &tb6->tb6_gc_hlist); +} + +static inline void fib6_del_gc_list(struct fib6_info *f6i) +{ + if (!hlist_unhashed(&f6i->gc_link)) + hlist_del_init(&f6i->gc_link); +} + +static inline void __fib6_set_expires(struct fib6_info *f6i, + unsigned long expires) +{ + f6i->expires = expires; f6i->fib6_flags |= RTF_EXPIRES; } +static inline void __fib6_clean_expires(struct fib6_info *f6i) +{ + f6i->fib6_flags &= ~RTF_EXPIRES; + f6i->expires = 0; +} + +/* fib6_info must be locked by the caller, and fib6_info->fib6_table can be + * NULL. + */ +static inline void fib6_set_expires_locked(struct fib6_info *f6i, + unsigned long expires) +{ + __fib6_set_expires(f6i, expires); + fib6_add_gc_list(f6i); +} + /* fib6_info must be locked by the caller, and fib6_info->fib6_table can be * NULL. If fib6_table is NULL, the fib6_info will no be inserted into the * list of GC candidates until it is inserted into a table. @@ -529,10 +555,8 @@ static inline void fib6_set_expires(struct fib6_info *f6i, static inline void fib6_clean_expires_locked(struct fib6_info *f6i) { - if (fib6_has_expires(f6i)) - hlist_del_init(&f6i->gc_link); - f6i->fib6_flags &= ~RTF_EXPIRES; - f6i->expires = 0; + fib6_del_gc_list(f6i); + __fib6_clean_expires(f6i); } static inline void fib6_clean_expires(struct fib6_info *f6i) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index b132feae3393..dcaeb88d73aa 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3763,10 +3763,10 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg, rt->dst_nocount = true; if (cfg->fc_flags & RTF_EXPIRES) - fib6_set_expires_locked(rt, jiffies + - clock_t_to_jiffies(cfg->fc_expires)); + __fib6_set_expires(rt, jiffies + + clock_t_to_jiffies(cfg->fc_expires)); else - fib6_clean_expires_locked(rt); + __fib6_clean_expires(rt); if (cfg->fc_protocol == RTPROT_UNSPEC) cfg->fc_protocol = RTPROT_BOOT;