diff mbox series

[v2,2/2] netconsole: allow selection of egress interface via MAC address

Message ID 20250204-netconsole-v2-2-5ef5eb5f6056@purestorage.com (mailing list archive)
State Not Applicable
Delegated to: Johannes Berg
Headers show
Series netconsole: allow selection of egress interface via MAC address | expand

Commit Message

Uday Shankar Feb. 4, 2025, 9:41 p.m. UTC
Currently, netconsole has two methods of configuration - module
parameter and configfs. The former interface allows for netconsole
activation earlier during boot (by specifying the module parameter on
the kernel command line), so it is preferred for debugging issues which
arise before userspace is up/the configfs interface can be used. The
module parameter syntax requires specifying the egress interface name.
This requirement makes it hard to use for a couple reasons:
- The egress interface name can be hard or impossible to predict. For
  example, installing a new network card in a system can change the
  interface names assigned by the kernel.
- When constructing the module parameter, one may have trouble
  determining the original (kernel-assigned) name of the interface
  (which is the name that should be given to netconsole) if some stable
  interface naming scheme is in effect. A human can usually look at
  kernel logs to determine the original name, but this is very painful
  if automation is constructing the parameter.

For these reasons, allow selection of the egress interface via MAC
address when configuring netconsole using the module parameter. Update
the netconsole documentation with an example of the new syntax.
Selection of egress interface by MAC address via configfs is far less
interesting (since when this interface can be used, one should be able
to easily convert between MAC address and interface name), so it is left
unimplemented.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
---
 Documentation/networking/netconsole.rst |  6 +++-
 include/linux/netpoll.h                 |  6 ++++
 net/core/netpoll.c                      | 51 +++++++++++++++++++++++++--------
 3 files changed, 50 insertions(+), 13 deletions(-)

Comments

Breno Leitao Feb. 5, 2025, 7:07 p.m. UTC | #1
On Tue, Feb 04, 2025 at 02:41:45PM -0700, Uday Shankar wrote:
> Currently, netconsole has two methods of configuration - module
> parameter and configfs. The former interface allows for netconsole
> activation earlier during boot (by specifying the module parameter on
> the kernel command line), so it is preferred for debugging issues which
> arise before userspace is up/the configfs interface can be used. The
> module parameter syntax requires specifying the egress interface name.
> This requirement makes it hard to use for a couple reasons:
> - The egress interface name can be hard or impossible to predict. For
>   example, installing a new network card in a system can change the
>   interface names assigned by the kernel.
> - When constructing the module parameter, one may have trouble
>   determining the original (kernel-assigned) name of the interface
>   (which is the name that should be given to netconsole) if some stable
>   interface naming scheme is in effect. A human can usually look at
>   kernel logs to determine the original name, but this is very painful
>   if automation is constructing the parameter.
> 
> For these reasons, allow selection of the egress interface via MAC
> address when configuring netconsole using the module parameter. Update
> the netconsole documentation with an example of the new syntax.
> Selection of egress interface by MAC address via configfs is far less
> interesting (since when this interface can be used, one should be able
> to easily convert between MAC address and interface name), so it is left
> unimplemented.
> 
> Signed-off-by: Uday Shankar <ushankar@purestorage.com>

Reviewed-by: Breno Leitao <leitao@debian.org>
Tested-by: Breno Leitao <leitao@debian.org>

>  int netpoll_setup(struct netpoll *np)
>  {
> +	struct net *net = current->nsproxy->net_ns;
>  	struct net_device *ndev = NULL;
>  	bool ip_overwritten = false;
> +	char buf[MAC_ADDR_LEN + 1];
>  	struct in_device *in_dev;
>  	int err;
>  
>  	rtnl_lock();
> -	if (np->dev_name[0]) {
> -		struct net *net = current->nsproxy->net_ns;
> +	if (np->dev_name[0])
>  		ndev = __dev_get_by_name(net, np->dev_name);
> -	}
> +	else if (is_valid_ether_addr(np->dev_mac))
> +		ndev = dev_getbyhwaddr_rcu(net, ARPHRD_ETHER, np->dev_mac);

You do not have the RCU read lock here. You have the rtnl(), which is
sufficient, but, CONFIG_PROVE_RCU_LIST will show something as:

	WARNING: suspicious RCU usage
	6.13.0-09701-g6610c7be45bb-dirty #18 Not tainted
	-----------------------------
	net/core/dev.c:1143 RCU-list traversed in non-reader section!!
	other info that might help us debug this:
	rcu_scheduler_active = 2, debug_locks = 1
	1 lock held by swapper/0/1:
	 #0: ffffffff832795b8 (rtnl_mutex){+.+.}-{4:4}, at: netpoll_setup+0x48/0x540
	stack backtrace:
	CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.13.0-virtme-09701-g6610c7be45bb-dirty #18
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
	Call Trace:
	 <TASK>
	 dump_stack_lvl+0x9f/0xf0
	 lockdep_rcu_suspicious+0x11a/0x150
	 dev_getbyhwaddr_rcu+0xb6/0xc0
	 netpoll_setup+0x8a/0x540
	 ? netpoll_parse_options+0x2bd/0x310

This is not a problem per-se, since you have RTNL. We probably need to
tell for_each_netdev_rcu() to not comply about "RCU-list traversed in
non-reader section" if RTNL is held. Not sure why we didn't hit in the
test infrastructure, tho:

	https://patchwork.kernel.org/project/netdevbpf/patch/20250204-netconsole-v2-2-5ef5eb5f6056@purestorage.com/

Anyway, no action item for you here. I am talking to Jakub on a way to
solve it, and I should send a fix soon.

Thanks for the patch,
--breno
Uday Shankar Feb. 5, 2025, 8:46 p.m. UTC | #2
On Wed, Feb 05, 2025 at 11:07:45AM -0800, Breno Leitao wrote:
> > +	else if (is_valid_ether_addr(np->dev_mac))
> > +		ndev = dev_getbyhwaddr_rcu(net, ARPHRD_ETHER, np->dev_mac);
> 
> You do not have the RCU read lock here. You have the rtnl(), which is
> sufficient, but, CONFIG_PROVE_RCU_LIST will show something as:
> 
> 	WARNING: suspicious RCU usage
> 	6.13.0-09701-g6610c7be45bb-dirty #18 Not tainted
> 	-----------------------------
> 	net/core/dev.c:1143 RCU-list traversed in non-reader section!!
> 	other info that might help us debug this:
> 	rcu_scheduler_active = 2, debug_locks = 1
> 	1 lock held by swapper/0/1:
> 	 #0: ffffffff832795b8 (rtnl_mutex){+.+.}-{4:4}, at: netpoll_setup+0x48/0x540
> 	stack backtrace:
> 	CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.13.0-virtme-09701-g6610c7be45bb-dirty #18
> 	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> 	Call Trace:
> 	 <TASK>
> 	 dump_stack_lvl+0x9f/0xf0
> 	 lockdep_rcu_suspicious+0x11a/0x150
> 	 dev_getbyhwaddr_rcu+0xb6/0xc0
> 	 netpoll_setup+0x8a/0x540
> 	 ? netpoll_parse_options+0x2bd/0x310
> 
> This is not a problem per-se, since you have RTNL. We probably need to
> tell for_each_netdev_rcu() to not comply about "RCU-list traversed in
> non-reader section" if RTNL is held. Not sure why we didn't hit in the
> test infrastructure, tho:
> 
> 	https://patchwork.kernel.org/project/netdevbpf/patch/20250204-netconsole-v2-2-5ef5eb5f6056@purestorage.com/

I don't think there is an automated test that will hit this path yet. I
guess you got this trace from your manual testing?

> 
> Anyway, no action item for you here. I am talking to Jakub on a way to
> solve it, and I should send a fix soon.

/**
 * list_for_each_entry_rcu	-	iterate over rcu list of given type
 * @pos:	the type * to use as a loop cursor.
 * @head:	the head for your list.
 * @member:	the name of the list_head within the struct.
 * @cond:	optional lockdep expression if called from non-RCU protection.
 *
 * This list-traversal primitive may safely run concurrently with
 * the _rcu list-mutation primitives such as list_add_rcu()
 * as long as the traversal is guarded by rcu_read_lock().
 */
#define list_for_each_entry_rcu(pos, head, member, cond...)		\
	for (__list_check_rcu(dummy, ## cond, 0),			\
	     pos = list_entry_rcu((head)->next, typeof(*pos), member);	\
		&pos->member != (head);					\
		pos = list_entry_rcu(pos->member.next, typeof(*pos), member))

If we do something like

list_for_each_entry_rcu(..., lockdep_rtnl_is_held())
	...

I think that code will be okay with being called with either rcu or rtnl
held. Of course, we need to plumb it through the net-specific helpers.
diff mbox series

Patch

diff --git a/Documentation/networking/netconsole.rst b/Documentation/networking/netconsole.rst
index 94c4680fdf3e7e1a0020d11b44547acfd68072a5..90a1bbb52918a0163828f4e96c89781e0bc6856b 100644
--- a/Documentation/networking/netconsole.rst
+++ b/Documentation/networking/netconsole.rst
@@ -45,7 +45,7 @@  following format::
 	r             if present, prepend kernel version (release) to the message
 	src-port      source for UDP packets (defaults to 6665)
 	src-ip        source IP to use (interface address)
-	dev           network interface (eth0)
+	dev           network interface name (eth0) or MAC address
 	tgt-port      port for logging agent (6666)
 	tgt-ip        IP address for logging agent
 	tgt-macaddr   ethernet MAC address for logging agent (broadcast)
@@ -62,6 +62,10 @@  or using IPv6::
 
  insmod netconsole netconsole=@/,@fd00:1:2:3::1/
 
+or using a MAC address to select the egress interface::
+
+   linux netconsole=4444@10.0.0.1/22:33:44:55:66:77,9353@10.0.0.2/12:34:56:78:9a:bc
+
 It also supports logging to multiple remote agents by specifying
 parameters for the multiple agents separated by semicolons and the
 complete string enclosed in "quotes", thusly::
diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index f91e50a76efd4b016381c632456397eea1ea877f..1ade65b59be49cfdcf86ed6e938287b949aa9f58 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -25,7 +25,13 @@  union inet_addr {
 struct netpoll {
 	struct net_device *dev;
 	netdevice_tracker dev_tracker;
+	/*
+	 * Either dev_name or dev_mac can be used to specify the local
+	 * interface - dev_name is used if it is a nonempty string, else
+	 * dev_mac is used.
+	 */
 	char dev_name[IFNAMSIZ];
+	u8 dev_mac[ETH_ALEN];
 	const char *name;
 
 	union inet_addr local_ip, remote_ip;
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 62b4041aae1ae8c7dc47c89fb40b14bbd4ad0e0e..c4b329fa874748eddff64bbba13902893faebbce 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -501,7 +501,8 @@  void netpoll_print_options(struct netpoll *np)
 		np_info(np, "local IPv6 address %pI6c\n", &np->local_ip.in6);
 	else
 		np_info(np, "local IPv4 address %pI4\n", &np->local_ip.ip);
-	np_info(np, "interface '%s'\n", np->dev_name);
+	np_info(np, "interface name '%s'\n", np->dev_name);
+	np_info(np, "local ethernet address '%pM'\n", np->dev_mac);
 	np_info(np, "remote port %d\n", np->remote_port);
 	if (np->ipv6)
 		np_info(np, "remote IPv6 address %pI6c\n", &np->remote_ip.in6);
@@ -570,11 +571,18 @@  int netpoll_parse_options(struct netpoll *np, char *opt)
 	cur++;
 
 	if (*cur != ',') {
-		/* parse out dev name */
+		/* parse out dev_name or dev_mac */
 		if ((delim = strchr(cur, ',')) == NULL)
 			goto parse_failed;
 		*delim = 0;
-		strscpy(np->dev_name, cur, sizeof(np->dev_name));
+
+		np->dev_name[0] = '\0';
+		eth_broadcast_addr(np->dev_mac);
+		if (!strchr(cur, ':'))
+			strscpy(np->dev_name, cur, sizeof(np->dev_name));
+		else if (!mac_pton(cur, np->dev_mac))
+			goto parse_failed;
+
 		cur = delim;
 	}
 	cur++;
@@ -679,27 +687,45 @@  int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
 }
 EXPORT_SYMBOL_GPL(__netpoll_setup);
 
+/*
+ * Returns a pointer to a string representation of the identifier used
+ * to select the egress interface for the given netpoll instance. buf
+ * must be a buffer of length at least MAC_ADDR_LEN + 1.
+ */
+static char *egress_dev(struct netpoll *np, char *buf)
+{
+	if (np->dev_name[0])
+		return np->dev_name;
+
+	snprintf(buf, MAC_ADDR_LEN, "%pM", np->dev_mac);
+	return buf;
+}
+
 int netpoll_setup(struct netpoll *np)
 {
+	struct net *net = current->nsproxy->net_ns;
 	struct net_device *ndev = NULL;
 	bool ip_overwritten = false;
+	char buf[MAC_ADDR_LEN + 1];
 	struct in_device *in_dev;
 	int err;
 
 	rtnl_lock();
-	if (np->dev_name[0]) {
-		struct net *net = current->nsproxy->net_ns;
+	if (np->dev_name[0])
 		ndev = __dev_get_by_name(net, np->dev_name);
-	}
+	else if (is_valid_ether_addr(np->dev_mac))
+		ndev = dev_getbyhwaddr_rcu(net, ARPHRD_ETHER, np->dev_mac);
+
 	if (!ndev) {
-		np_err(np, "%s doesn't exist, aborting\n", np->dev_name);
+		np_err(np, "%s doesn't exist, aborting\n", egress_dev(np, buf));
 		err = -ENODEV;
 		goto unlock;
 	}
 	netdev_hold(ndev, &np->dev_tracker, GFP_KERNEL);
 
 	if (netdev_master_upper_dev_get(ndev)) {
-		np_err(np, "%s is a slave device, aborting\n", np->dev_name);
+		np_err(np, "%s is a slave device, aborting\n",
+		       egress_dev(np, buf));
 		err = -EBUSY;
 		goto put;
 	}
@@ -707,7 +733,8 @@  int netpoll_setup(struct netpoll *np)
 	if (!netif_running(ndev)) {
 		unsigned long atmost;
 
-		np_info(np, "device %s not up yet, forcing it\n", np->dev_name);
+		np_info(np, "device %s not up yet, forcing it\n",
+			egress_dev(np, buf));
 
 		err = dev_open(ndev, NULL);
 
@@ -741,7 +768,7 @@  int netpoll_setup(struct netpoll *np)
 			if (!ifa) {
 put_noaddr:
 				np_err(np, "no IP address for %s, aborting\n",
-				       np->dev_name);
+				       egress_dev(np, buf));
 				err = -EDESTADDRREQ;
 				goto put;
 			}
@@ -772,13 +799,13 @@  int netpoll_setup(struct netpoll *np)
 			}
 			if (err) {
 				np_err(np, "no IPv6 address for %s, aborting\n",
-				       np->dev_name);
+				       egress_dev(np, buf));
 				goto put;
 			} else
 				np_info(np, "local IPv6 %pI6c\n", &np->local_ip.in6);
 #else
 			np_err(np, "IPv6 is not supported %s, aborting\n",
-			       np->dev_name);
+			       egress_dev(np, buf));
 			err = -EINVAL;
 			goto put;
 #endif