diff mbox series

[RESEND,v2] can: netlink: prevent incoherent can configuration in case of early return

Message ID 20210903071704.455855-1-mailhol.vincent@wanadoo.fr (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series [RESEND,v2] can: netlink: prevent incoherent can configuration in case of early return | expand

Checks

Context Check Description
netdev/tree_selection success Series ignored based on subject

Commit Message

Vincent Mailhol Sept. 3, 2021, 7:17 a.m. UTC
struct can_priv has a set of flags (can_priv::ctrlmode) which are
correlated with the other fields of the structure. In
can_changelink(), those flags are set first and copied to can_priv. If
the function has to return early, for example due to an out of range
value provided by the user, then the global configuration might become
incoherent.

Example: the user provides an out of range dbitrate (e.g. 20
Mbps). The command fails (-EINVAL), however the FD flag was already
set resulting in a configuration where FD is on but the databittiming
parameters are empty.

* Illustration of above example *

| $ ip link set can0 type can bitrate 500000 dbitrate 20000000 fd on
| RTNETLINK answers: Invalid argument
| $ ip --details link show can0
| 1: can0: <NOARP,ECHO> mtu 72 qdisc noop state DOWN mode DEFAULT group default qlen 10
|     link/can  promiscuity 0 minmtu 0 maxmtu 0
|     can <FD> state STOPPED restart-ms 0
           ^^ FD flag is set without any of the databittiming parameters...
| 	  bitrate 500000 sample-point 0.875
| 	  tq 12 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 1
| 	  ES582.1/ES584.1: tseg1 2..256 tseg2 2..128 sjw 1..128 brp 1..512 brp-inc 1
| 	  ES582.1/ES584.1: dtseg1 2..32 dtseg2 1..16 dsjw 1..8 dbrp 1..32 dbrp-inc 1
| 	  clock 80000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

To prevent this from happening, we do a local copy of can_priv, work
on it, an copy it at the very end of the function (i.e. only if all
previous checks succeeded).

Once this done, there is no more need to have a temporary variable for
a specific parameter. As such, the bittiming and data bittiming (bt
and dbt) are directly written to the temporary priv variable.

Finally, function can_calc_tdco() was retrieving can_priv from the
net_device and directly modifying it. We changed the prototype so that
it instead writes its changes into our temporary priv variable.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
Resending because I got no answers on:
https://lore.kernel.org/linux-can/20210823024750.702542-1-mailhol.vincent@wanadoo.fr/T/#u
(I guess everyone bas busy with the upcoming merge window)

I am not sure whether or not this needs a "Fixes" tag. Just in case,
there it is:

Fixes: 9859ccd2c8be ("can: introduce the data bitrate configuration for CAN FD")

* Changelog *

v1 -> v2:
  - Change the prototype of can_calc_tdco() so that the changes are
    applied to the temporary priv instead of netdev_priv(dev).
---
 drivers/net/can/dev/bittiming.c |  8 +--
 drivers/net/can/dev/netlink.c   | 88 +++++++++++++++++----------------
 include/linux/can/bittiming.h   |  7 ++-
 3 files changed, 53 insertions(+), 50 deletions(-)

Comments

Marc Kleine-Budde Sept. 6, 2021, 8:18 a.m. UTC | #1
On 03.09.2021 16:17:04, Vincent Mailhol wrote:
> struct can_priv has a set of flags (can_priv::ctrlmode) which are
> correlated with the other fields of the structure. In
> can_changelink(), those flags are set first and copied to can_priv. If
> the function has to return early, for example due to an out of range
> value provided by the user, then the global configuration might become
> incoherent.
> 
> Example: the user provides an out of range dbitrate (e.g. 20
> Mbps). The command fails (-EINVAL), however the FD flag was already
> set resulting in a configuration where FD is on but the databittiming
> parameters are empty.
> 
> * Illustration of above example *
> 
> | $ ip link set can0 type can bitrate 500000 dbitrate 20000000 fd on
> | RTNETLINK answers: Invalid argument
> | $ ip --details link show can0
> | 1: can0: <NOARP,ECHO> mtu 72 qdisc noop state DOWN mode DEFAULT group default qlen 10
> |     link/can  promiscuity 0 minmtu 0 maxmtu 0
> |     can <FD> state STOPPED restart-ms 0
>            ^^ FD flag is set without any of the databittiming parameters...
> | 	  bitrate 500000 sample-point 0.875
> | 	  tq 12 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 1
> | 	  ES582.1/ES584.1: tseg1 2..256 tseg2 2..128 sjw 1..128 brp 1..512 brp-inc 1
> | 	  ES582.1/ES584.1: dtseg1 2..32 dtseg2 1..16 dsjw 1..8 dbrp 1..32 dbrp-inc 1
> | 	  clock 80000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
> 
> To prevent this from happening, we do a local copy of can_priv, work
> on it, an copy it at the very end of the function (i.e. only if all
> previous checks succeeded).

I don't like the optimization of using a static priv. If it's too big to
be allocated on the stack, allocate it on the heap, i.e. using
kmemdup()/kfree().

> Once this done, there is no more need to have a temporary variable for
> a specific parameter. As such, the bittiming and data bittiming (bt
> and dbt) are directly written to the temporary priv variable.
> 
> Finally, function can_calc_tdco() was retrieving can_priv from the
> net_device and directly modifying it. We changed the prototype so that
> it instead writes its changes into our temporary priv variable.

Is it possible to split this into a separate patch, so that the part
without the tdco can be backported more easily to older kernels not
having tdco? The patch fixing the tdco would be the 2nd patch...

> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> ---
> Resending because I got no answers on:
> https://lore.kernel.org/linux-can/20210823024750.702542-1-mailhol.vincent@wanadoo.fr/T/#u
> (I guess everyone bas busy with the upcoming merge window)

Busy yes, but not with the merge window :)

> I am not sure whether or not this needs a "Fixes" tag. Just in case,
> there it is:
> 
> Fixes: 9859ccd2c8be ("can: introduce the data bitrate configuration for CAN FD")

...if it's possible to split this patch into 2 parts, add individual
fixes tags to them.

regards,
Marc
Vincent Mailhol Sept. 6, 2021, 2:17 p.m. UTC | #2
On Mon. 6 Sep 2021 at 17:18, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> On 03.09.2021 16:17:04, Vincent Mailhol wrote:
> > struct can_priv has a set of flags (can_priv::ctrlmode) which are
> > correlated with the other fields of the structure. In
> > can_changelink(), those flags are set first and copied to can_priv. If
> > the function has to return early, for example due to an out of range
> > value provided by the user, then the global configuration might become
> > incoherent.
> >
> > Example: the user provides an out of range dbitrate (e.g. 20
> > Mbps). The command fails (-EINVAL), however the FD flag was already
> > set resulting in a configuration where FD is on but the databittiming
> > parameters are empty.
> >
> > * Illustration of above example *
> >
> > | $ ip link set can0 type can bitrate 500000 dbitrate 20000000 fd on
> > | RTNETLINK answers: Invalid argument
> > | $ ip --details link show can0
> > | 1: can0: <NOARP,ECHO> mtu 72 qdisc noop state DOWN mode DEFAULT group default qlen 10
> > |     link/can  promiscuity 0 minmtu 0 maxmtu 0
> > |     can <FD> state STOPPED restart-ms 0
> >            ^^ FD flag is set without any of the databittiming parameters...
> > |       bitrate 500000 sample-point 0.875
> > |       tq 12 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 1
> > |       ES582.1/ES584.1: tseg1 2..256 tseg2 2..128 sjw 1..128 brp 1..512 brp-inc 1
> > |       ES582.1/ES584.1: dtseg1 2..32 dtseg2 1..16 dsjw 1..8 dbrp 1..32 dbrp-inc 1
> > |       clock 80000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
> >
> > To prevent this from happening, we do a local copy of can_priv, work
> > on it, an copy it at the very end of the function (i.e. only if all
> > previous checks succeeded).
>
> I don't like the optimization of using a static priv. If it's too big to
> be allocated on the stack, allocate it on the heap, i.e. using
> kmemdup()/kfree().

The static declaration is only an issue of coding style, correct?
Or is there an actual risk of doing so?
This is for my understanding, I will remove the static
declaration regardless of your answer.

On my x86_64 machine, sizeof(priv) is 448 and if I declare priv on the stack:
| $ objdump -d drivers/net/can/dev/netlink.o | ./scripts/checkstack.pl
| 0x00000000000002100 can_changelink []:            1200

So I will allocate it on the heap.

N.B. In above figures CONFIG_CAN_LEDS is *off* because that driver
was tagged as broken in:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=30f3b42147ba6f29bc95c1bba34468740762d91b

> > Once this done, there is no more need to have a temporary variable for
> > a specific parameter. As such, the bittiming and data bittiming (bt
> > and dbt) are directly written to the temporary priv variable.
> >
> > Finally, function can_calc_tdco() was retrieving can_priv from the
> > net_device and directly modifying it. We changed the prototype so that
> > it instead writes its changes into our temporary priv variable.
>
> Is it possible to split this into a separate patch, so that the part
> without the tdco can be backported more easily to older kernels not
> having tdco? The patch fixing the tdco would be the 2nd patch...

ACK. I will send a v3 with that split.

> > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> > ---
> > Resending because I got no answers on:
> > https://lore.kernel.org/linux-can/20210823024750.702542-1-mailhol.vincent@wanadoo.fr/T/#u
> > (I guess everyone bas busy with the upcoming merge window)
>
> Busy yes, but not with the merge window :)
>
> > I am not sure whether or not this needs a "Fixes" tag. Just in case,
> > there it is:
> >
> > Fixes: 9859ccd2c8be ("can: introduce the data bitrate configuration for CAN FD")
>
> ...if it's possible to split this patch into 2 parts, add individual
> fixes tags to them.

ACK.


> regards,
> Marc
>
> --
> Pengutronix e.K.                 | Marc Kleine-Budde           |
> Embedded Linux                   | https://www.pengutronix.de  |
> Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |
Marc Kleine-Budde Sept. 6, 2021, 2:30 p.m. UTC | #3
On 06.09.2021 23:17:40, Vincent MAILHOL wrote:
> > > To prevent this from happening, we do a local copy of can_priv, work
> > > on it, an copy it at the very end of the function (i.e. only if all
> > > previous checks succeeded).
> >
> > I don't like the optimization of using a static priv. If it's too big to
> > be allocated on the stack, allocate it on the heap, i.e. using
> > kmemdup()/kfree().
> 
> The static declaration is only an issue of coding style, correct?

I don't know (but I haven't checked) if the coding style doc says
anything about that.

> Or is there an actual risk of doing so?

As you pointed out, this relies on the serialization of the changelink
callback by the networking stack. There's no sane way in C to track this
requirement in the networking stack, so I don't want to have any
roadblocks and/or potential bugs in the CAN code. Marking a variable as
static places it in the BSS section, right? This mean, the memory is
always "used", even if not setting the bitrate.

> This is for my understanding, I will remove the static
> declaration regardless of your answer.

tnx

> On my x86_64 machine, sizeof(priv) is 448 and if I declare priv on the stack:
> | $ objdump -d drivers/net/can/dev/netlink.o | ./scripts/checkstack.pl
> | 0x00000000000002100 can_changelink []:            1200
> 
> So I will allocate it on the heap.

Sounds reasonable.

> N.B. In above figures CONFIG_CAN_LEDS is *off* because that driver
> was tagged as broken in:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=30f3b42147ba6f29bc95c1bba34468740762d91b

ok - BTW: I think we can remove LEDs support now, it's marked as broken
for more than 3 years.

> > > Once this done, there is no more need to have a temporary variable for
> > > a specific parameter. As such, the bittiming and data bittiming (bt
> > > and dbt) are directly written to the temporary priv variable.
> > >
> > > Finally, function can_calc_tdco() was retrieving can_priv from the
> > > net_device and directly modifying it. We changed the prototype so that
> > > it instead writes its changes into our temporary priv variable.
> >
> > Is it possible to split this into a separate patch, so that the part
> > without the tdco can be backported more easily to older kernels not
> > having tdco? The patch fixing the tdco would be the 2nd patch...
> 
> ACK. I will send a v3 with that split.

Thanks for helping taking care of the LTS kernels!

regards,
Marc
diff mbox series

Patch

diff --git a/drivers/net/can/dev/bittiming.c b/drivers/net/can/dev/bittiming.c
index f49170eadd54..bddd93e2e439 100644
--- a/drivers/net/can/dev/bittiming.c
+++ b/drivers/net/can/dev/bittiming.c
@@ -175,13 +175,9 @@  int can_calc_bittiming(struct net_device *dev, struct can_bittiming *bt,
 	return 0;
 }
 
-void can_calc_tdco(struct net_device *dev)
+void can_calc_tdco(struct can_tdc *tdc, const struct can_tdc_const *tdc_const,
+		   const struct can_bittiming *dbt)
 {
-	struct can_priv *priv = netdev_priv(dev);
-	const struct can_bittiming *dbt = &priv->data_bittiming;
-	struct can_tdc *tdc = &priv->tdc;
-	const struct can_tdc_const *tdc_const = priv->tdc_const;
-
 	if (!tdc_const)
 		return;
 
diff --git a/drivers/net/can/dev/netlink.c b/drivers/net/can/dev/netlink.c
index 80425636049d..50dfed462711 100644
--- a/drivers/net/can/dev/netlink.c
+++ b/drivers/net/can/dev/netlink.c
@@ -58,14 +58,20 @@  static int can_changelink(struct net_device *dev, struct nlattr *tb[],
 			  struct nlattr *data[],
 			  struct netlink_ext_ack *extack)
 {
-	struct can_priv *priv = netdev_priv(dev);
+	/* Work on a local copy of priv to prevent inconsistent value
+	 * in case of early return. net/core/rtnetlink.c has a global
+	 * mutex so using a static declaration is race free
+	 */
+	static struct can_priv priv;
 	int err;
 
 	/* We need synchronization with dev->stop() */
 	ASSERT_RTNL();
 
+	memcpy(&priv, netdev_priv(dev), sizeof(priv));
+
 	if (data[IFLA_CAN_BITTIMING]) {
-		struct can_bittiming bt;
+		struct can_bittiming *bt = &priv.bittiming;
 
 		/* Do not allow changing bittiming while running */
 		if (dev->flags & IFF_UP)
@@ -76,28 +82,26 @@  static int can_changelink(struct net_device *dev, struct nlattr *tb[],
 		 * directly via do_set_bitrate(). Bail out if neither
 		 * is given.
 		 */
-		if (!priv->bittiming_const && !priv->do_set_bittiming)
+		if (!priv.bittiming_const && !priv.do_set_bittiming)
 			return -EOPNOTSUPP;
 
-		memcpy(&bt, nla_data(data[IFLA_CAN_BITTIMING]), sizeof(bt));
-		err = can_get_bittiming(dev, &bt,
-					priv->bittiming_const,
-					priv->bitrate_const,
-					priv->bitrate_const_cnt);
+		memcpy(bt, nla_data(data[IFLA_CAN_BITTIMING]), sizeof(*bt));
+		err = can_get_bittiming(dev, bt,
+					priv.bittiming_const,
+					priv.bitrate_const,
+					priv.bitrate_const_cnt);
 		if (err)
 			return err;
 
-		if (priv->bitrate_max && bt.bitrate > priv->bitrate_max) {
+		if (priv.bitrate_max && bt->bitrate > priv.bitrate_max) {
 			netdev_err(dev, "arbitration bitrate surpasses transceiver capabilities of %d bps\n",
-				   priv->bitrate_max);
+				   priv.bitrate_max);
 			return -EINVAL;
 		}
 
-		memcpy(&priv->bittiming, &bt, sizeof(bt));
-
-		if (priv->do_set_bittiming) {
+		if (priv.do_set_bittiming) {
 			/* Finally, set the bit-timing registers */
-			err = priv->do_set_bittiming(dev);
+			err = priv.do_set_bittiming(dev);
 			if (err)
 				return err;
 		}
@@ -112,11 +116,11 @@  static int can_changelink(struct net_device *dev, struct nlattr *tb[],
 		if (dev->flags & IFF_UP)
 			return -EBUSY;
 		cm = nla_data(data[IFLA_CAN_CTRLMODE]);
-		ctrlstatic = priv->ctrlmode_static;
+		ctrlstatic = priv.ctrlmode_static;
 		maskedflags = cm->flags & cm->mask;
 
 		/* check whether provided bits are allowed to be passed */
-		if (maskedflags & ~(priv->ctrlmode_supported | ctrlstatic))
+		if (maskedflags & ~(priv.ctrlmode_supported | ctrlstatic))
 			return -EOPNOTSUPP;
 
 		/* do not check for static fd-non-iso if 'fd' is disabled */
@@ -128,16 +132,16 @@  static int can_changelink(struct net_device *dev, struct nlattr *tb[],
 			return -EOPNOTSUPP;
 
 		/* clear bits to be modified and copy the flag values */
-		priv->ctrlmode &= ~cm->mask;
-		priv->ctrlmode |= maskedflags;
+		priv.ctrlmode &= ~cm->mask;
+		priv.ctrlmode |= maskedflags;
 
 		/* CAN_CTRLMODE_FD can only be set when driver supports FD */
-		if (priv->ctrlmode & CAN_CTRLMODE_FD) {
+		if (priv.ctrlmode & CAN_CTRLMODE_FD) {
 			dev->mtu = CANFD_MTU;
 		} else {
 			dev->mtu = CAN_MTU;
-			memset(&priv->data_bittiming, 0,
-			       sizeof(priv->data_bittiming));
+			memset(&priv.data_bittiming, 0,
+			       sizeof(priv.data_bittiming));
 		}
 	}
 
@@ -145,7 +149,7 @@  static int can_changelink(struct net_device *dev, struct nlattr *tb[],
 		/* Do not allow changing restart delay while running */
 		if (dev->flags & IFF_UP)
 			return -EBUSY;
-		priv->restart_ms = nla_get_u32(data[IFLA_CAN_RESTART_MS]);
+		priv.restart_ms = nla_get_u32(data[IFLA_CAN_RESTART_MS]);
 	}
 
 	if (data[IFLA_CAN_RESTART]) {
@@ -158,7 +162,7 @@  static int can_changelink(struct net_device *dev, struct nlattr *tb[],
 	}
 
 	if (data[IFLA_CAN_DATA_BITTIMING]) {
-		struct can_bittiming dbt;
+		struct can_bittiming *dbt = &priv.data_bittiming;
 
 		/* Do not allow changing bittiming while running */
 		if (dev->flags & IFF_UP)
@@ -169,31 +173,29 @@  static int can_changelink(struct net_device *dev, struct nlattr *tb[],
 		 * directly via do_set_bitrate(). Bail out if neither
 		 * is given.
 		 */
-		if (!priv->data_bittiming_const && !priv->do_set_data_bittiming)
+		if (!priv.data_bittiming_const && !priv.do_set_data_bittiming)
 			return -EOPNOTSUPP;
 
-		memcpy(&dbt, nla_data(data[IFLA_CAN_DATA_BITTIMING]),
-		       sizeof(dbt));
-		err = can_get_bittiming(dev, &dbt,
-					priv->data_bittiming_const,
-					priv->data_bitrate_const,
-					priv->data_bitrate_const_cnt);
+		memcpy(dbt, nla_data(data[IFLA_CAN_DATA_BITTIMING]),
+		       sizeof(*dbt));
+		err = can_get_bittiming(dev, dbt,
+					priv.data_bittiming_const,
+					priv.data_bitrate_const,
+					priv.data_bitrate_const_cnt);
 		if (err)
 			return err;
 
-		if (priv->bitrate_max && dbt.bitrate > priv->bitrate_max) {
+		if (priv.bitrate_max && dbt->bitrate > priv.bitrate_max) {
 			netdev_err(dev, "canfd data bitrate surpasses transceiver capabilities of %d bps\n",
-				   priv->bitrate_max);
+				   priv.bitrate_max);
 			return -EINVAL;
 		}
 
-		memcpy(&priv->data_bittiming, &dbt, sizeof(dbt));
-
-		can_calc_tdco(dev);
+		can_calc_tdco(&priv.tdc, priv.tdc_const, &priv.data_bittiming);
 
-		if (priv->do_set_data_bittiming) {
+		if (priv.do_set_data_bittiming) {
 			/* Finally, set the bit-timing registers */
-			err = priv->do_set_data_bittiming(dev);
+			err = priv.do_set_data_bittiming(dev);
 			if (err)
 				return err;
 		}
@@ -201,28 +203,30 @@  static int can_changelink(struct net_device *dev, struct nlattr *tb[],
 
 	if (data[IFLA_CAN_TERMINATION]) {
 		const u16 termval = nla_get_u16(data[IFLA_CAN_TERMINATION]);
-		const unsigned int num_term = priv->termination_const_cnt;
+		const unsigned int num_term = priv.termination_const_cnt;
 		unsigned int i;
 
-		if (!priv->do_set_termination)
+		if (!priv.do_set_termination)
 			return -EOPNOTSUPP;
 
 		/* check whether given value is supported by the interface */
 		for (i = 0; i < num_term; i++) {
-			if (termval == priv->termination_const[i])
+			if (termval == priv.termination_const[i])
 				break;
 		}
 		if (i >= num_term)
 			return -EINVAL;
 
 		/* Finally, set the termination value */
-		err = priv->do_set_termination(dev, termval);
+		err = priv.do_set_termination(dev, termval);
 		if (err)
 			return err;
 
-		priv->termination = termval;
+		priv.termination = termval;
 	}
 
+	memcpy(netdev_priv(dev), &priv, sizeof(priv));
+
 	return 0;
 }
 
diff --git a/include/linux/can/bittiming.h b/include/linux/can/bittiming.h
index 9de6e9053e34..b3c1711ee0f0 100644
--- a/include/linux/can/bittiming.h
+++ b/include/linux/can/bittiming.h
@@ -87,7 +87,8 @@  struct can_tdc_const {
 int can_calc_bittiming(struct net_device *dev, struct can_bittiming *bt,
 		       const struct can_bittiming_const *btc);
 
-void can_calc_tdco(struct net_device *dev);
+void can_calc_tdco(struct can_tdc *tdc, const struct can_tdc_const *tdc_const,
+		   const struct can_bittiming *dbt);
 #else /* !CONFIG_CAN_CALC_BITTIMING */
 static inline int
 can_calc_bittiming(struct net_device *dev, struct can_bittiming *bt,
@@ -97,7 +98,9 @@  can_calc_bittiming(struct net_device *dev, struct can_bittiming *bt,
 	return -EINVAL;
 }
 
-static inline void can_calc_tdco(struct net_device *dev)
+static inline void
+can_calc_tdco(struct can_tdc *tdc, const struct can_tdc_const *tdc_const,
+	      const struct can_bittiming *dbt)
 {
 }
 #endif /* CONFIG_CAN_CALC_BITTIMING */