diff mbox series

[net-next,v3,1/2] net/ipv6: insert a f6i to a GC list only if the f6i is in a fib6_table tree.

Message ID 20231213213735.434249-2-thinker.li@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series Fix dangling pointer at f6i->gc_link. | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1616 this patch: 1616
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 1147 this patch: 1147
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1656 this patch: 1656
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 79 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Kui-Feng Lee Dec. 13, 2023, 9:37 p.m. UTC
From: Kui-Feng Lee <thinker.li@gmail.com>

Check f6i->fib6_node and hlist_unhashed(&f6i->gc_link) before inserting a
f6i (fib6_info) to tb6_gc_hlist.

The current implementation checks if f6i->fib6_table is not NULL to
determines if a f6i is on a tree, however it is not enough. When a f6i is
removed from a fib6_table, f6i->fib6_table is not reset. However, fib6_node
is always reset when a f6i is removed from a fib6_table and is set when a
f6i is added to a fib6_table. So, f6i->fib6_node is a reliable way to
determine if a f6i is on a tree.

The current implementation checks RTF_EXPIRES on f6i->fib6_flags to
determine if a f6i is on a GC list. It also consider if the f6i is on a
tree before making a conclusion. This is indirect and complicated. The new
solution is checking hlist_unhashed(&f6i->gc_link), a clear evidence.

Putting them together, these changes provide more reliable signals to
determines if a f6i should be added/or removed to a GC list.

Fixes: 3dec89b14d37 ("net/ipv6: Remove expired routes with a separated list of routes.")
Reported-by: syzbot+c15aa445274af8674f41@syzkaller.appspotmail.com
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: dsahern@kernel.org
---
 include/net/ip6_fib.h | 46 ++++++++++++++++++++++++++++++++-----------
 net/ipv6/route.c      |  6 +++---
 2 files changed, 38 insertions(+), 14 deletions(-)

Comments

David Ahern Dec. 14, 2023, 6:11 a.m. UTC | #1
On 12/13/23 2:37 PM, thinker.li@gmail.com wrote:
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index b132feae3393..dcaeb88d73aa 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -3763,10 +3763,10 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg,
>  		rt->dst_nocount = true;
>  
>  	if (cfg->fc_flags & RTF_EXPIRES)
> -		fib6_set_expires_locked(rt, jiffies +
> -					clock_t_to_jiffies(cfg->fc_expires));
> +		__fib6_set_expires(rt, jiffies +
> +				   clock_t_to_jiffies(cfg->fc_expires));
>  	else
> -		fib6_clean_expires_locked(rt);
> +		__fib6_clean_expires(rt);

as Eric noted in a past comment, the clean is not needed in this
function since memory is initialized to 0 (expires is never set).

Also, this patch set does not fundamentally change the logic, so it
cannot fix the bug reported in

https://lore.kernel.org/all/20231205173250.2982846-1-edumazet@google.com/

please hold off future versions of this set until the problem in that
stack traced is fixed. I have tried a few things using RA's, but have
not been able to recreate UAF.
Kui-Feng Lee Dec. 14, 2023, 11:43 p.m. UTC | #2
On 12/13/23 22:11, David Ahern wrote:
> On 12/13/23 2:37 PM, thinker.li@gmail.com wrote:
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index b132feae3393..dcaeb88d73aa 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -3763,10 +3763,10 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg,
>>   		rt->dst_nocount = true;
>>   
>>   	if (cfg->fc_flags & RTF_EXPIRES)
>> -		fib6_set_expires_locked(rt, jiffies +
>> -					clock_t_to_jiffies(cfg->fc_expires));
>> +		__fib6_set_expires(rt, jiffies +
>> +				   clock_t_to_jiffies(cfg->fc_expires));
>>   	else
>> -		fib6_clean_expires_locked(rt);
>> +		__fib6_clean_expires(rt);
> 
> as Eric noted in a past comment, the clean is not needed in this
> function since memory is initialized to 0 (expires is never set).
> 
> Also, this patch set does not fundamentally change the logic, so it
> cannot fix the bug reported in
> 
> https://lore.kernel.org/all/20231205173250.2982846-1-edumazet@google.com/
> 
> please hold off future versions of this set until the problem in that
> stack traced is fixed. I have tried a few things using RA's, but have
> not been able to recreate UAF.
Sure
Kui-Feng Lee Dec. 15, 2023, 7:12 p.m. UTC | #3
On 12/13/23 22:11, David Ahern wrote:
> On 12/13/23 2:37 PM, thinker.li@gmail.com wrote:
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index b132feae3393..dcaeb88d73aa 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -3763,10 +3763,10 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg,
>>   		rt->dst_nocount = true;
>>   
>>   	if (cfg->fc_flags & RTF_EXPIRES)
>> -		fib6_set_expires_locked(rt, jiffies +
>> -					clock_t_to_jiffies(cfg->fc_expires));
>> +		__fib6_set_expires(rt, jiffies +
>> +				   clock_t_to_jiffies(cfg->fc_expires));
>>   	else
>> -		fib6_clean_expires_locked(rt);
>> +		__fib6_clean_expires(rt);
> 
> as Eric noted in a past comment, the clean is not needed in this
> function since memory is initialized to 0 (expires is never set).
> 
> Also, this patch set does not fundamentally change the logic, so it
> cannot fix the bug reported in
> 
> https://lore.kernel.org/all/20231205173250.2982846-1-edumazet@google.com/
> 
> please hold off future versions of this set until the problem in that
> stack traced is fixed. I have tried a few things using RA's, but have
> not been able to recreate UAF.

I tried to reproduce the issue yesterday, according to the hypothesis
behind the patch of this thread. The following is the instructions
to reproduce the UAF issue. However, this instruction doesn't create
a crash at the place since the memory is still available even it has
been free. But, the log shows a UAF.

The patch at the end of this message is required to reproduce and
show UAF. The most critical change in the patch is to insert
a 'mdelay(3000)' to sleep 3s in rt6_route_rcv(). That gives us
a chance to manipulate the kernel to reproduce the UAF.

Here is my conclusion. There is no protection between finding
a route and changing the route in rt6_route_rcv(), including inserting
the entry to the gc list. It is possible to insert an entry that will be
free later to the gc list, causing a UAF. There is more explanations
along with the following logs.

Instructions:
      - Preparation
        - install ipv6toolkit on the host.
        - run qemu with a patched kernel as a guest through
          with the host through qemubr0 a bridge.
        - On the guest
          - stop systemd-networkd.service & systemd-networkd.socket if 
there are.
          - sysctl -w net.ipv6.conf.enp0s3.accept_ra=2
          - sysctl -w net.ipv6.conf.enp0s3.accept_ra_rt_info_max_plen=127
        - Assume the address of qemubr0 in the host is
          fe80::4ce9:92ff:fe27:75df.
      - Test
        - ra6 -i qemubr0 -d ff02::1 -R 'fe82::/64#1#300' -t 300 # On the 
host
        - sleep 2; ip -6 route del fe82::/64                    # On the 
guest
        - ra6 -i qemubr0 -d ff02::1 -R 'fe82::/64#1#300' -t 300 # On the 
host
        - ra6 -i qemubr0 -d ff02::1 -R 'fe81::/64#1#300' -t 0   # On the 
host
      - The step 3 should start immediately after step 2 since we have
        a gap of merely 3 seconds in the kernel.


The log generated by the test:

# this is the log triggered by step 1

qemu login: [    4.673867] __ip6_ins_rt fe80::5054:ff:fe12:3456/128
[   82.138139] rt = 0000000000000000
[   82.138731] fib6_info_alloc: ffff888103950200
[   82.139088] __ip6_ins_rt ::/0
[   82.139993] fib6_add: add ffff888103950200 to gc list 
ffff888103950238 pprev ffff888102878200 next 0000000000000000
[   82.141719] rt = ffff888103950200
[   82.141748] rt6_route_rcv
[   82.141748] fib6_info_alloc: ffff88810093be00
[   82.141748] __ip6_ins_rt fe82::/64
[   82.141748] rt6_route_rcv: route info ffff88810093be00, sleep 3s
[   85.121803] fib6_set_expires_locked: add ffff88810093be00 to gc list 
ffff88810093be38 pprev ffff888102878200 next ffff888103950238
[   85.146497] rt6_route_rcv: route info ffff88810093be00, after release


# This is the log triggered by step 2 & 3.
#
# The line containing fib6_clean_expires_locked is specifically
# triggered by step 2. Step 2 removes the entry from the tree and
# gc list. Step 3 free the entry by calling
# fib6_info_release() at the very end of rt6_route_rcv() since it is
# the last owner of that entry. fib6_info_destroy_rcu proves that.
#
# However, even the entry will be free later, rt6_route_rcv() still add
# the entry back to the gc list before freeing the entry. In other
# words, it create an entry (ffff88810093be38) that is free in
# a gc list.
#
# Keep an eye on ffff88810093be38 and ffff88810093be00. ffff88810093be00
# is the address of a fib6_info and ffff88810093be38 is the address of
# its gc_link.
#
# This log also complies with the log in
#
#  https://lore.kernel.org/all/20231205173250.2982846-1-edumazet@google.com/
#
# The entry is free by the call of fib6_info_release() in
# rt6_route_rcv().

[  105.158140] rt = ffff888103950200
[  105.158590] rt6_route_rcv
[  105.158924] rt6_route_rcv: route info ffff88810093be00, sleep 3s
[  106.368875] fib6_clean_expires_locked: del ffff88810093be00 from gc 
list pprev ffff888102878200 next ffff888103950238
[  107.201815] fib6_set_expires_locked: add ffff88810093be00 to gc list 
ffff88810093be38 pprev ffff888102878200 next ffff888103950238
[  108.159815] rt6_route_rcv: route info ffff88810093be00, after release
[  108.168807] fib6_info_destroy_rcu ffff88810093be00


# This is the log triggered by step 4.
# The line containing fib6_clean_expires_locked shows the free entry
# mentioned in the previous part is still in the gc list. (pprev
# ffff88810093be38)
# Since fib6_clean_expires_locked() calls hlist_del_init() to remove
# an entry from the gc list, it will change the value of *pprev
# (ffff88810093be38). It causes an UAF case.

[  131.882130] rt = ffff888103950200
[  131.882567] __ip6_del_rt ::/0
[  131.882932] fib6_clean_expires_locked: del ffff888103950200 from gc 
list pprev ffff88810093be38 next 0000000000000000
[  131.883296] rt6_route_rcv
[  131.883296] fib6_info_alloc: ffff88810093be00
[  131.884517] __ip6_ins_rt fe81::/64
[  131.884517] rt6_route_rcv: route info ffff88810093be00, sleep 3s
[  134.305866] fib6_set_expires_locked: add ffff88810093be00 to gc list 
ffff88810093be38 pprev ffff888102878200 next ffff88810093be38
[  134.537866] rt6_route_rcv: route info ffff88810093be00, after release
[  134.896801] fib6_info_destroy_rcu ffff888103950200


# The following log is the kernel errors that printed after 10s seconds.

[  168.321812] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! 
[swapper/3:0]
[  196.289823] watchdog: BUG: soft lockup - CPU#3 stuck for 52s! 
[swapper/3:0]
[  214.244784] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  214.245723] rcu:     3-....: (7 ticks this GP) idle=198c/0/0x1 
softirq=2002/2003 fqs=12309
[  214.245774] rcu:     (detected by 2, t=60002 jiffies, g=1213, q=282 
ncpus=4)
[  240.321823] watchdog: BUG: soft lockup - CPU#3 stuck for 93s! 
[swapper/3:0]
[  268.353804] watchdog: BUG: soft lockup - CPU#3 stuck for 119s! 
[swapper/3:0]
[  296.388805] watchdog: BUG: soft lockup - CPU#3 stuck for 146s! 
[swapper/3:0]





diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 1ba9f4ddf2f6..6e059ba3d2d0 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -510,8 +510,11 @@ static inline void fib6_set_expires_locked(struct 
fib6_info *f6i,

  	tb6 = f6i->fib6_table;
  	f6i->expires = expires;
-	if (tb6 && !fib6_has_expires(f6i))
+	if (tb6 && !fib6_has_expires(f6i)) {
  		hlist_add_head(&f6i->gc_link, &tb6->tb6_gc_hlist);
+		printk(KERN_CRIT "fib6_set_expires_locked: add %px to gc list %px 
pprev %px next %px\n",
+		       f6i, &f6i->gc_link, f6i->gc_link.pprev, f6i->gc_link.next);
+	}
  	f6i->fib6_flags |= RTF_EXPIRES;
  }

@@ -529,8 +532,11 @@ static inline void fib6_set_expires(struct 
fib6_info *f6i,

  static inline void fib6_clean_expires_locked(struct fib6_info *f6i)
  {
-	if (fib6_has_expires(f6i))
+	if (fib6_has_expires(f6i)) {
+		printk(KERN_CRIT "fib6_clean_expires_locked: del %px from gc list 
pprev %px next %px\n",
+		       f6i, f6i->gc_link.pprev, f6i->gc_link.next);
  		hlist_del_init(&f6i->gc_link);
+	}
  	f6i->fib6_flags &= ~RTF_EXPIRES;
  	f6i->expires = 0;
  }
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 28b01a068412..b275b9798b5e 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -162,6 +162,8 @@ struct fib6_info *fib6_info_alloc(gfp_t gfp_flags, 
bool with_fib6_nh)

  	INIT_HLIST_NODE(&f6i->gc_link);

+	printk(KERN_CRIT "fib6_info_alloc: %px\n", f6i);
+
  	return f6i;
  }

@@ -178,6 +180,7 @@ void fib6_info_destroy_rcu(struct rcu_head *head)

  	ip_fib_metrics_put(f6i->fib6_metrics);
  	kfree(f6i);
+	printk(KERN_CRIT "fib6_info_destroy_rcu %px\n", f6i);
  }
  EXPORT_SYMBOL_GPL(fib6_info_destroy_rcu);

@@ -1486,8 +1489,10 @@ int fib6_add(struct fib6_node *root, struct 
fib6_info *rt,
  			list_add(&rt->nh_list, &rt->nh->f6i_list);
  		__fib6_update_sernum_upto_root(rt, fib6_new_sernum(info->nl_net));

-		if (fib6_has_expires(rt))
+		if (fib6_has_expires(rt)) {
  			hlist_add_head(&rt->gc_link, &table->tb6_gc_hlist);
+			printk(KERN_CRIT "fib6_add: add %px to gc list %px pprev %px next 
%px\n", rt, &rt->gc_link, rt->gc_link.pprev, rt->gc_link.next);
+		}

  		fib6_start_gc(info->nl_net, rt);
  	}
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index b132feae3393..52283a80f79e 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -935,6 +935,8 @@ int rt6_route_rcv(struct net_device *dev, u8 *opt, 
int len,
  	unsigned long lifetime;
  	struct fib6_info *rt;

+	printk(KERN_CRIT "rt6_route_rcv\n");
+
  	if (len < sizeof(struct route_info)) {
  		return -EINVAL;
  	}
@@ -989,12 +991,15 @@ int rt6_route_rcv(struct net_device *dev, u8 *opt, 
int len,
  				 (rt->fib6_flags & ~RTF_PREF_MASK) | RTF_PREF(pref);

  	if (rt) {
+		printk(KERN_CRIT "rt6_route_rcv: route info %px, sleep 3s\n", rt);
+		mdelay(3000);
  		if (!addrconf_finite_timeout(lifetime))
  			fib6_clean_expires(rt);
  		else
  			fib6_set_expires(rt, jiffies + HZ * lifetime);

  		fib6_info_release(rt);
+		printk(KERN_CRIT "rt6_route_rcv: route info %px, after release\n", rt);
  	}
  	return 0;
  }
@@ -1297,6 +1302,9 @@ static int __ip6_ins_rt(struct fib6_info *rt, 
struct nl_info *info,
  {
  	int err;
  	struct fib6_table *table;
+	if (rt)
+		printk(KERN_CRIT "__ip6_ins_rt %pI6c/%d\n",
+		       &rt->fib6_dst.addr, rt->fib6_dst.plen);

  	table = rt->fib6_table;
  	spin_lock_bh(&table->tb6_lock);
@@ -3855,6 +3863,9 @@ static int __ip6_del_rt(struct fib6_info *rt, 
struct nl_info *info)
  	struct net *net = info->nl_net;
  	struct fib6_table *table;
  	int err;
+	if (rt)
+		printk(KERN_CRIT "__ip6_del_rt %pI6c/%d\n",
+		       &rt->fib6_dst.addr, rt->fib6_dst.plen);

  	if (rt == net->ipv6.fib6_null_entry) {
  		err = -ENOENT;
@@ -4345,8 +4356,10 @@ struct fib6_info *rt6_get_dflt_router(struct net 
*net,
  		    ipv6_addr_equal(&nh->fib_nh_gw6, addr))
  			break;
  	}
-	if (rt && !fib6_info_hold_safe(rt))
+	if (rt && !fib6_info_hold_safe(rt)) {
  		rt = NULL;
+	}
+	printk(KERN_CRIT "rt = %px\n", rt);
  	rcu_read_unlock();
  	return rt;
  }
David Ahern Dec. 16, 2023, 6:36 p.m. UTC | #4
On 12/15/23 11:12 AM, Kui-Feng Lee wrote:
> 
> I tried to reproduce the issue yesterday, according to the hypothesis
> behind the patch of this thread. The following is the instructions
> to reproduce the UAF issue. However, this instruction doesn't create
> a crash at the place since the memory is still available even it has
> been free. But, the log shows a UAF.
> 
> The patch at the end of this message is required to reproduce and
> show UAF. The most critical change in the patch is to insert
> a 'mdelay(3000)' to sleep 3s in rt6_route_rcv(). That gives us
> a chance to manipulate the kernel to reproduce the UAF.
> 
> Here is my conclusion. There is no protection between finding
> a route and changing the route in rt6_route_rcv(), including inserting
> the entry to the gc list. It is possible to insert an entry that will be
> free later to the gc list, causing a UAF. There is more explanations
> along with the following logs.

TL;DR: I think 3dec89b14d37 should be reverted for 6.6 and 6.7
(selftests should be valid with or without this change) and you try
again for 6.9 (6.7 dev cycle is about to close).

###

I was successful in triggering UAF twice yesterday, but a slightly
different code path that in Eric's trace:

This is the WARN_ON for !hlist_unhashed in fib6_info_release:

[   46.926339] ------------[ cut here ]------------
[   46.926368] WARNING: CPU: 3 PID: 0 at include/net/ip6_fib.h:332
fib6_info_release+0x2a/0x43
[   46.926384] Modules linked in:
[   46.926393] CPU: 3 PID: 0 Comm: swapper/3 Not tainted
6.7.0-rc4-debug+ #16
[   46.926399] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.15.0-1 04/01/2014
[   46.926404] RIP: 0010:fib6_info_release+0x2a/0x43
[   46.926409] Code: 48 85 ff 74 3d 55 48 89 e5 53 48 89 fb 48 8d 7f 2c
e8 3e ff ff ff 84 c0 74 25 48 8d 7b 40 e8 33 11 8b ff 48 83 7b 40 00 74
02 <0f> 0b 48 8d bb a0 00 00 00 48 c7 c6 93 ea ae 81 e8 3f 00 66 ff 5b
[   46.926416] RSP: 0018:ffffc900002a8390 EFLAGS: 00010282
[   46.926422] RAX: 1ffff11002048b00 RBX: ffff888010245c00 RCX:
ffffffff81af8c9e
[   46.926426] RDX: dffffc0000000000 RSI: 0000000000000004 RDI:
ffff888010245c40
[   46.926431] RBP: ffffc900002a8398 R08: ffffed1002048b86 R09:
0000000000000001
[   46.926435] R10: ffffffff81af8be4 R11: ffff888010245c2f R12:
ffff88800aeec000
[   46.926439] R13: 0000000000000000 R14: ffff88801ab7c200 R15:
0000000000000020
[   46.926444] FS:  0000000000000000(0000) GS:ffff88806c000000(0000)
knlGS:0000000000000000
[   46.926448] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   46.926452] CR2: 00007f897cc95fb0 CR3: 000000001d69d000 CR4:
0000000000750ef0
[   46.926458] PKRU: 55555554
[   46.926462] Call Trace:
[   46.926466]  <IRQ>
[   46.926470]  ? show_regs+0x5c/0x60
[   46.926478]  ? fib6_info_release+0x2a/0x43
[   46.926483]  ? __warn+0xcb/0x19c
[   46.926489]  ? fib6_info_release+0x2a/0x43
[   46.926495]  ? report_bug+0x114/0x186
[   46.926504]  ? handle_bug+0x40/0x6b
[   46.926510]  ? exc_invalid_op+0x19/0x41
[   46.926515]  ? asm_exc_invalid_op+0x1b/0x20
[   46.926524]  ? refcount_dec_and_test+0x15/0x43
[   46.926530]  ? fib6_info_release+0x23/0x43
[   46.926536]  ? fib6_info_release+0x2a/0x43
[   46.926542]  ndisc_router_discovery+0xf41/0xfa6


UAF here:

[   47.941928]
==================================================================
[   47.942548] BUG: KASAN: slab-use-after-free in
fib6_gc_all.constprop.0+0x19b/0x294
[   47.943178] Read of size 8 at addr ffff888010245c38 by task swapper/3/0

[   47.943866] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W
 6.7.0-rc4-debug+ #16
[   47.944548] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.15.0-1 04/01/2014
[   47.945204] Call Trace:
[   47.945416]  <IRQ>
[   47.945595]  dump_stack_lvl+0x5b/0x82
[   47.945912]  print_address_description.constprop.0+0x7a/0x2eb
[   47.946391]  print_report+0x106/0x1e0
[   47.946703]  ? virt_to_slab+0x9/0x1a
[   47.947007]  ? kmem_cache_debug_flags+0x13/0x1f
[   47.947392]  ? kasan_complete_mode_report_info+0x19e/0x1a1
[   47.947850]  ? fib6_gc_all.constprop.0+0x19b/0x294
[   47.948254]  kasan_report+0x99/0xc4
[   47.948554]  ? fib6_gc_all.constprop.0+0x19b/0x294
[   47.948962]  __asan_load8+0x77/0x79
[   47.949262]  fib6_gc_all.constprop.0+0x19b/0x294


for a route added here:

[   47.970092] Allocated by task 0:
[   47.970366]  stack_trace_save+0x8d/0xbb
[   47.970373]  kasan_save_stack+0x26/0x46
[   47.970379]  kasan_set_track+0x25/0x2b
[   47.970385]  kasan_save_alloc_info+0x1e/0x21
[   47.970392]  ____kasan_kmalloc+0x6f/0x7b
[   47.970398]  __kasan_kmalloc+0x9/0xb
[   47.970404]  __kmalloc+0x91/0xbf
[   47.970411]  kzalloc+0xf/0x11
[   47.970416]  fib6_info_alloc+0x26/0xa1
[   47.970422]  ip6_route_info_create+0x266/0x6c5
[   47.970428]  ip6_route_add+0x14/0x46
[   47.970433]  rt6_add_dflt_router+0x123/0x1bd
[   47.970439]  ndisc_router_discovery+0x5a4/0xfa6
[   47.970447]  ndisc_rcv+0x1a2/0x1b5

This is just aggressive RA's and aggressive gc with a hack to the stack
to toggle lifetime or metric on every RA (something that can be scripted
with an ra command - set and reset lifetime in every other RA and change
the metric in every RA).

I was targeting this code path because I noticed rt6_add_dflt_router
sets RTF_EXPIRES but does not pass in the expires value. There are races
(like the one you found with a different code path) in the handling of
RTF_EXPIRES, setting the timer, running gc and adding the route entry to
the gc list.

In short there are a number of places in the RA path that need the
lifetime handling cleaned up first to make it all consistent with the
idea of using a linked list to track entries with an expires tag.
Kui-Feng Lee Dec. 18, 2023, 1:05 a.m. UTC | #5
On 12/16/23 10:36, David Ahern wrote:
> On 12/15/23 11:12 AM, Kui-Feng Lee wrote:
>>
>> I tried to reproduce the issue yesterday, according to the hypothesis
>> behind the patch of this thread. The following is the instructions
>> to reproduce the UAF issue. However, this instruction doesn't create
>> a crash at the place since the memory is still available even it has
>> been free. But, the log shows a UAF.
>>
>> The patch at the end of this message is required to reproduce and
>> show UAF. The most critical change in the patch is to insert
>> a 'mdelay(3000)' to sleep 3s in rt6_route_rcv(). That gives us
>> a chance to manipulate the kernel to reproduce the UAF.
>>
>> Here is my conclusion. There is no protection between finding
>> a route and changing the route in rt6_route_rcv(), including inserting
>> the entry to the gc list. It is possible to insert an entry that will be
>> free later to the gc list, causing a UAF. There is more explanations
>> along with the following logs.
> 
> TL;DR: I think 3dec89b14d37 should be reverted for 6.6 and 6.7
> (selftests should be valid with or without this change) and you try
> again for 6.9 (6.7 dev cycle is about to close).
> 
> ###
> 
> I was successful in triggering UAF twice yesterday, but a slightly
> different code path that in Eric's trace:
> 
> This is the WARN_ON for !hlist_unhashed in fib6_info_release:
> 
> [   46.926339] ------------[ cut here ]------------
> [   46.926368] WARNING: CPU: 3 PID: 0 at include/net/ip6_fib.h:332
> fib6_info_release+0x2a/0x43
> [   46.926384] Modules linked in:
> [   46.926393] CPU: 3 PID: 0 Comm: swapper/3 Not tainted
> 6.7.0-rc4-debug+ #16
> [   46.926399] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> 1.15.0-1 04/01/2014
> [   46.926404] RIP: 0010:fib6_info_release+0x2a/0x43
> [   46.926409] Code: 48 85 ff 74 3d 55 48 89 e5 53 48 89 fb 48 8d 7f 2c
> e8 3e ff ff ff 84 c0 74 25 48 8d 7b 40 e8 33 11 8b ff 48 83 7b 40 00 74
> 02 <0f> 0b 48 8d bb a0 00 00 00 48 c7 c6 93 ea ae 81 e8 3f 00 66 ff 5b
> [   46.926416] RSP: 0018:ffffc900002a8390 EFLAGS: 00010282
> [   46.926422] RAX: 1ffff11002048b00 RBX: ffff888010245c00 RCX:
> ffffffff81af8c9e
> [   46.926426] RDX: dffffc0000000000 RSI: 0000000000000004 RDI:
> ffff888010245c40
> [   46.926431] RBP: ffffc900002a8398 R08: ffffed1002048b86 R09:
> 0000000000000001
> [   46.926435] R10: ffffffff81af8be4 R11: ffff888010245c2f R12:
> ffff88800aeec000
> [   46.926439] R13: 0000000000000000 R14: ffff88801ab7c200 R15:
> 0000000000000020
> [   46.926444] FS:  0000000000000000(0000) GS:ffff88806c000000(0000)
> knlGS:0000000000000000
> [   46.926448] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   46.926452] CR2: 00007f897cc95fb0 CR3: 000000001d69d000 CR4:
> 0000000000750ef0
> [   46.926458] PKRU: 55555554
> [   46.926462] Call Trace:
> [   46.926466]  <IRQ>
> [   46.926470]  ? show_regs+0x5c/0x60
> [   46.926478]  ? fib6_info_release+0x2a/0x43
> [   46.926483]  ? __warn+0xcb/0x19c
> [   46.926489]  ? fib6_info_release+0x2a/0x43
> [   46.926495]  ? report_bug+0x114/0x186
> [   46.926504]  ? handle_bug+0x40/0x6b
> [   46.926510]  ? exc_invalid_op+0x19/0x41
> [   46.926515]  ? asm_exc_invalid_op+0x1b/0x20
> [   46.926524]  ? refcount_dec_and_test+0x15/0x43
> [   46.926530]  ? fib6_info_release+0x23/0x43
> [   46.926536]  ? fib6_info_release+0x2a/0x43
> [   46.926542]  ndisc_router_discovery+0xf41/0xfa6
> 
> 
> UAF here:
> 
> [   47.941928]
> ==================================================================
> [   47.942548] BUG: KASAN: slab-use-after-free in
> fib6_gc_all.constprop.0+0x19b/0x294
> [   47.943178] Read of size 8 at addr ffff888010245c38 by task swapper/3/0
> 
> [   47.943866] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W
>   6.7.0-rc4-debug+ #16
> [   47.944548] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> 1.15.0-1 04/01/2014
> [   47.945204] Call Trace:
> [   47.945416]  <IRQ>
> [   47.945595]  dump_stack_lvl+0x5b/0x82
> [   47.945912]  print_address_description.constprop.0+0x7a/0x2eb
> [   47.946391]  print_report+0x106/0x1e0
> [   47.946703]  ? virt_to_slab+0x9/0x1a
> [   47.947007]  ? kmem_cache_debug_flags+0x13/0x1f
> [   47.947392]  ? kasan_complete_mode_report_info+0x19e/0x1a1
> [   47.947850]  ? fib6_gc_all.constprop.0+0x19b/0x294
> [   47.948254]  kasan_report+0x99/0xc4
> [   47.948554]  ? fib6_gc_all.constprop.0+0x19b/0x294
> [   47.948962]  __asan_load8+0x77/0x79
> [   47.949262]  fib6_gc_all.constprop.0+0x19b/0x294
> 
> 
> for a route added here:
> 
> [   47.970092] Allocated by task 0:
> [   47.970366]  stack_trace_save+0x8d/0xbb
> [   47.970373]  kasan_save_stack+0x26/0x46
> [   47.970379]  kasan_set_track+0x25/0x2b
> [   47.970385]  kasan_save_alloc_info+0x1e/0x21
> [   47.970392]  ____kasan_kmalloc+0x6f/0x7b
> [   47.970398]  __kasan_kmalloc+0x9/0xb
> [   47.970404]  __kmalloc+0x91/0xbf
> [   47.970411]  kzalloc+0xf/0x11
> [   47.970416]  fib6_info_alloc+0x26/0xa1
> [   47.970422]  ip6_route_info_create+0x266/0x6c5
> [   47.970428]  ip6_route_add+0x14/0x46
> [   47.970433]  rt6_add_dflt_router+0x123/0x1bd
> [   47.970439]  ndisc_router_discovery+0x5a4/0xfa6
> [   47.970447]  ndisc_rcv+0x1a2/0x1b5
> 
> This is just aggressive RA's and aggressive gc with a hack to the stack
> to toggle lifetime or metric on every RA (something that can be scripted
> with an ra command - set and reset lifetime in every other RA and change
> the metric in every RA).
> 
> I was targeting this code path because I noticed rt6_add_dflt_router
> sets RTF_EXPIRES but does not pass in the expires value. There are races
> (like the one you found with a different code path) in the handling of
> RTF_EXPIRES, setting the timer, running gc and adding the route entry to
> the gc list.

I am not sure what you mean a race in rt6_add_dflt_router().
In this function, it calls ip6_route_add() to add a default route
with RTF_EXPIRES. In ip6_route_add(), it calls ip6_route_info_create()
to create a f6i, and calls __ip6_ins_rt() to add the entry to gc list.

Although there is a gap between ip6_route_info_create() and
__ip6_ins_rt() without any protection, it should not cause a race.
The newly created entry is not available to the rest of the system
until __ip6_ins_rt() adds it to the tree.  And, adding the entry
to the gc list and adding the entry to the tree are performed
atomically, being protected by table->tb6_lock. So, it should not have
a race between adding to tree and adding to gc list if this is what
you mean.

By the way, do you happen to have a chance to try the patch here in
you tests?  It fixed the issue for my test scenario. According to
your stacktraces, it should be the same issue happening in my test.
Somewhere adds a entry to gc list by faultily believing that
the entry is still on a tree. And, the patch here fixes this
faultily believe.

> 
> In short there are a number of places in the RA path that need the
> lifetime handling cleaned up first to make it all consistent with the
> idea of using a linked list to track entries with an expires tag.
Kui-Feng Lee Dec. 18, 2023, 1:16 a.m. UTC | #6
On 12/15/23 11:12, Kui-Feng Lee wrote:
> 
> 
> On 12/13/23 22:11, David Ahern wrote:
>> On 12/13/23 2:37 PM, thinker.li@gmail.com wrote:
>>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>>> index b132feae3393..dcaeb88d73aa 100644
>>> --- a/net/ipv6/route.c
>>> +++ b/net/ipv6/route.c
>>> @@ -3763,10 +3763,10 @@ static struct fib6_info 
>>> *ip6_route_info_create(struct fib6_config *cfg,
>>>           rt->dst_nocount = true;
>>>       if (cfg->fc_flags & RTF_EXPIRES)
>>> -        fib6_set_expires_locked(rt, jiffies +
>>> -                    clock_t_to_jiffies(cfg->fc_expires));
>>> +        __fib6_set_expires(rt, jiffies +
>>> +                   clock_t_to_jiffies(cfg->fc_expires));
>>>       else
>>> -        fib6_clean_expires_locked(rt);
>>> +        __fib6_clean_expires(rt);
>>
>> as Eric noted in a past comment, the clean is not needed in this
>> function since memory is initialized to 0 (expires is never set).
>>
>> Also, this patch set does not fundamentally change the logic, so it
>> cannot fix the bug reported in
>>
>> https://lore.kernel.org/all/20231205173250.2982846-1-edumazet@google.com/
>>
>> please hold off future versions of this set until the problem in that
>> stack traced is fixed. I have tried a few things using RA's, but have
>> not been able to recreate UAF.
> 
> I tried to reproduce the issue yesterday, according to the hypothesis
> behind the patch of this thread. The following is the instructions
> to reproduce the UAF issue. However, this instruction doesn't create
> a crash at the place since the memory is still available even it has
> been free. But, the log shows a UAF.
> 
> The patch at the end of this message is required to reproduce and
> show UAF. The most critical change in the patch is to insert
> a 'mdelay(3000)' to sleep 3s in rt6_route_rcv(). That gives us
> a chance to manipulate the kernel to reproduce the UAF.
> 
> Here is my conclusion. There is no protection between finding
> a route and changing the route in rt6_route_rcv(), including inserting
> the entry to the gc list. It is possible to insert an entry that will be
> free later to the gc list, causing a UAF. There is more explanations
> along with the following logs.
> 
> Instructions:
>       - Preparation
>         - install ipv6toolkit on the host.
>         - run qemu with a patched kernel as a guest through
>           with the host through qemubr0 a bridge.
>         - On the guest
>           - stop systemd-networkd.service & systemd-networkd.socket if 
> there are.
>           - sysctl -w net.ipv6.conf.enp0s3.accept_ra=2
>           - sysctl -w net.ipv6.conf.enp0s3.accept_ra_rt_info_max_plen=127
>         - Assume the address of qemubr0 in the host is
>           fe80::4ce9:92ff:fe27:75df.
>       - Test
>         - ra6 -i qemubr0 -d ff02::1 -R 'fe82::/64#1#300' -t 300 # On the 
> host
>         - sleep 2; ip -6 route del fe82::/64                    # On the 
> guest
>         - ra6 -i qemubr0 -d ff02::1 -R 'fe82::/64#1#300' -t 300 # On the 
> host
>         - ra6 -i qemubr0 -d ff02::1 -R 'fe81::/64#1#300' -t 0   # On the 
> host
>       - The step 3 should start immediately after step 2 since we have
>         a gap of merely 3 seconds in the kernel.
> 

I did the same test with the fix patch along with this thread.
The following is the log I got.

# This is the log printed by the step 1.
# ffff888108066600 was add to the gc list.
[    4.596875] __ip6_ins_rt fe80::5054:ff:fe12:3456/128
[   38.925482] rt = 0000000000000000
[   38.925916] fib6_info_alloc: ffff888108066e00
[   38.926441] __ip6_ins_rt ::/0
[   38.926441] fib6_add: add ffff888108066e00 to gc list 
ffff888108066e38 pprev ffff888102a02800 next 0000000000000000
[   38.927084] rt = ffff888108066e00
[   38.927084] rt6_route_rcv
[   38.927084] fib6_info_alloc: ffff888108066600
[   38.929333] __ip6_ins_rt fe82::/64
[   38.929333] rt6_route_rcv: route info ffff888108066600, sleep 3s
[   40.948831] fib6_set_expires_locked: add ffff888108066600 to gc list 
ffff888108066638 pprev ffff888102a02800 next ffff888108066e38
[   41.932501] rt6_route_rcv: route info ffff888108066600, after release

# This is the log printed by the step 2 & 3.
# The entry (ffff888108066600) removed from the gc list by
# fib6_clean_expires_locked was not added back to the gc list again.
[   68.173441] rt = ffff888108066e00
[   68.173883] rt6_route_rcv
[   68.174245] rt6_route_rcv: route info ffff888108066600, sleep 3s
[   69.389112] fib6_clean_expires_locked: del ffff888108066600 from gc 
list pprev ffff888102a02800 next ffff888108066e38
[   70.900831] rt6_route_rcv: route info ffff888108066600, after release
[   71.182822] fib6_info_destroy_rcu ffff888108066600

# This is the log printed by the step 4.
[  109.017446] rt = ffff888108066e00
[  109.017893] __ip6_del_rt ::/0
[  109.018288] fib6_clean_expires_locked: del ffff888108066e00 from gc 
list pprev ffff888102a02800 next 0000000000000000
[  109.019471] rt6_route_rcv
[  109.019471] fib6_info_alloc: ffff888108066600
[  109.019471] __ip6_ins_rt fe81::/64
[  109.019471] rt6_route_rcv: route info ffff888108066600, sleep 3s
[  111.028831] fib6_set_expires_locked: add ffff888108066600 to gc list 
ffff888108066638 pprev ffff888102a02800 next 0000000000000000
[  112.023607] rt6_route_rcv: route info ffff888108066600, after release
[  112.031825] fib6_info_destroy_rcu ffff888108066e00
diff mbox series

Patch

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 1ba9f4ddf2f6..1213722c394f 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -500,21 +500,47 @@  void fib6_gc_cleanup(void);
 
 int fib6_init(void);
 
-/* fib6_info must be locked by the caller, and fib6_info->fib6_table can be
- * NULL.
- */
-static inline void fib6_set_expires_locked(struct fib6_info *f6i,
-					   unsigned long expires)
+static inline void fib6_add_gc_list(struct fib6_info *f6i)
 {
 	struct fib6_table *tb6;
 
 	tb6 = f6i->fib6_table;
-	f6i->expires = expires;
-	if (tb6 && !fib6_has_expires(f6i))
+	if (tb6 &&
+	    rcu_dereference_protected(f6i->fib6_node,
+				      lockdep_is_held(&tb6->tb6_lock)) &&
+	    hlist_unhashed(&f6i->gc_link))
 		hlist_add_head(&f6i->gc_link, &tb6->tb6_gc_hlist);
+}
+
+static inline void fib6_del_gc_list(struct fib6_info *f6i)
+{
+	if (!hlist_unhashed(&f6i->gc_link))
+		hlist_del_init(&f6i->gc_link);
+}
+
+static inline void __fib6_set_expires(struct fib6_info *f6i,
+				      unsigned long expires)
+{
+	f6i->expires = expires;
 	f6i->fib6_flags |= RTF_EXPIRES;
 }
 
+static inline void __fib6_clean_expires(struct fib6_info *f6i)
+{
+	f6i->fib6_flags &= ~RTF_EXPIRES;
+	f6i->expires = 0;
+}
+
+/* fib6_info must be locked by the caller, and fib6_info->fib6_table can be
+ * NULL.
+ */
+static inline void fib6_set_expires_locked(struct fib6_info *f6i,
+					   unsigned long expires)
+{
+	__fib6_set_expires(f6i, expires);
+	fib6_add_gc_list(f6i);
+}
+
 /* fib6_info must be locked by the caller, and fib6_info->fib6_table can be
  * NULL.  If fib6_table is NULL, the fib6_info will no be inserted into the
  * list of GC candidates until it is inserted into a table.
@@ -529,10 +555,8 @@  static inline void fib6_set_expires(struct fib6_info *f6i,
 
 static inline void fib6_clean_expires_locked(struct fib6_info *f6i)
 {
-	if (fib6_has_expires(f6i))
-		hlist_del_init(&f6i->gc_link);
-	f6i->fib6_flags &= ~RTF_EXPIRES;
-	f6i->expires = 0;
+	fib6_del_gc_list(f6i);
+	__fib6_clean_expires(f6i);
 }
 
 static inline void fib6_clean_expires(struct fib6_info *f6i)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index b132feae3393..dcaeb88d73aa 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3763,10 +3763,10 @@  static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg,
 		rt->dst_nocount = true;
 
 	if (cfg->fc_flags & RTF_EXPIRES)
-		fib6_set_expires_locked(rt, jiffies +
-					clock_t_to_jiffies(cfg->fc_expires));
+		__fib6_set_expires(rt, jiffies +
+				   clock_t_to_jiffies(cfg->fc_expires));
 	else
-		fib6_clean_expires_locked(rt);
+		__fib6_clean_expires(rt);
 
 	if (cfg->fc_protocol == RTPROT_UNSPEC)
 		cfg->fc_protocol = RTPROT_BOOT;