diff mbox series

can: etas_es58x: change opened_channel_cnt's type from atomic_t to u8

Message ID 20220212112713.577957-1-mailhol.vincent@wanadoo.fr (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series can: etas_es58x: change opened_channel_cnt's type from atomic_t to u8 | expand

Checks

Context Check Description
netdev/tree_selection success Series ignored based on subject

Commit Message

Vincent Mailhol Feb. 12, 2022, 11:27 a.m. UTC
The driver uses an atomic_t variable: es58x_device:opened_channel_cnt
to keep track of the number of opened channels in order to only
allocate memory for the URBs when this count changes from zero to one.

While the intent was to prevent race conditions, the choice of an
atomic_t turns out to be a bad idea for several reasons:

  - implementation is incorrect and fails to decrement
    opened_channel_cnt when the URB allocation fails as reported in
    [1].

  - even if opened_channel_cnt were to be correctly decremented,
    atomic_t is insufficient to cover edge cases: there can be a race
    condition in which 1/ a first process fails to allocate URBs
    memory 2/ a second process enters es58x_open() before the first
    process does its cleanup and decrements opened_channed_cnt. In
    which case, the second process would successfully return despite
    the URBs memory not being allocated.

  - actually, any kind of locking mechanism was useless here because
    it is redundant with the network stack big kernel lock
    (a.k.a. rtnl_lock) which is being hold by all the callers of
    net_device_ops:ndo_open() and net_device_ops:ndo_close(). c.f. the
    ASSERST_RTNL() calls in __dev_open() [2] and __dev_close_many()
    [3].

The atmomic_t is thus replaced by a simple u8 type and the logic to
increment and decrement es58x_device:opened_channel_cnt is simplified
accordingly fixing the bug reported in [1]. We do not check again for
ASSERST_RTNL() as this is already done by the callers.

[1] https://lore.kernel.org/linux-can/20220201140351.GA2548@kili/T/#u
[2] https://elixir.bootlin.com/linux/v5.16/source/net/core/dev.c#L1463
[3] https://elixir.bootlin.com/linux/v5.16/source/net/core/dev.c#L1541

Fixes: 8537257874e9 ("can: etas_es58x: add core support for ETAS ES58X
CAN USB interfaces")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
 drivers/net/can/usb/etas_es58x/es58x_core.c | 9 +++++----
 drivers/net/can/usb/etas_es58x/es58x_core.h | 8 +++++---
 2 files changed, 10 insertions(+), 7 deletions(-)

Comments

Marc Kleine-Budde Feb. 12, 2022, 3:57 p.m. UTC | #1
On 12.02.2022 20:27:13, Vincent Mailhol wrote:
> The driver uses an atomic_t variable: es58x_device:opened_channel_cnt
> to keep track of the number of opened channels in order to only
> allocate memory for the URBs when this count changes from zero to one.
> 
> While the intent was to prevent race conditions, the choice of an
> atomic_t turns out to be a bad idea for several reasons:
> 
>   - implementation is incorrect and fails to decrement
>     opened_channel_cnt when the URB allocation fails as reported in
>     [1].
> 
>   - even if opened_channel_cnt were to be correctly decremented,
>     atomic_t is insufficient to cover edge cases: there can be a race
>     condition in which 1/ a first process fails to allocate URBs
>     memory 2/ a second process enters es58x_open() before the first
>     process does its cleanup and decrements opened_channed_cnt. In
>     which case, the second process would successfully return despite
>     the URBs memory not being allocated.
> 
>   - actually, any kind of locking mechanism was useless here because
>     it is redundant with the network stack big kernel lock
>     (a.k.a. rtnl_lock) which is being hold by all the callers of
>     net_device_ops:ndo_open() and net_device_ops:ndo_close(). c.f. the
>     ASSERST_RTNL() calls in __dev_open() [2] and __dev_close_many()
>     [3].
> 
> The atmomic_t is thus replaced by a simple u8 type and the logic to
> increment and decrement es58x_device:opened_channel_cnt is simplified
> accordingly fixing the bug reported in [1]. We do not check again for
> ASSERST_RTNL() as this is already done by the callers.
> 
> [1] https://lore.kernel.org/linux-can/20220201140351.GA2548@kili/T/#u
> [2] https://elixir.bootlin.com/linux/v5.16/source/net/core/dev.c#L1463
> [3] https://elixir.bootlin.com/linux/v5.16/source/net/core/dev.c#L1541
> 
> Fixes: 8537257874e9 ("can: etas_es58x: add core support for ETAS ES58X
> CAN USB interfaces")
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>

Applied to can/testing.

I you (or someone else) wants to increase their patch count feel free to
convert the other USB CAN drivers from atomic_t to u8, too.

Thanks,
Marc
Vincent Mailhol Feb. 14, 2022, 5:37 a.m. UTC | #2
On Sun. 13 Feb 2022 at 00:57, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> On 12.02.2022 20:27:13, Vincent Mailhol wrote:
> > The driver uses an atomic_t variable: es58x_device:opened_channel_cnt
> > to keep track of the number of opened channels in order to only
> > allocate memory for the URBs when this count changes from zero to one.
> >
> > While the intent was to prevent race conditions, the choice of an
> > atomic_t turns out to be a bad idea for several reasons:
> >
> >   - implementation is incorrect and fails to decrement
> >     opened_channel_cnt when the URB allocation fails as reported in
> >     [1].
> >
> >   - even if opened_channel_cnt were to be correctly decremented,
> >     atomic_t is insufficient to cover edge cases: there can be a race
> >     condition in which 1/ a first process fails to allocate URBs
> >     memory 2/ a second process enters es58x_open() before the first
> >     process does its cleanup and decrements opened_channed_cnt. In
> >     which case, the second process would successfully return despite
> >     the URBs memory not being allocated.
> >
> >   - actually, any kind of locking mechanism was useless here because
> >     it is redundant with the network stack big kernel lock
> >     (a.k.a. rtnl_lock) which is being hold by all the callers of
> >     net_device_ops:ndo_open() and net_device_ops:ndo_close(). c.f. the
> >     ASSERST_RTNL() calls in __dev_open() [2] and __dev_close_many()
> >     [3].
> >
> > The atmomic_t is thus replaced by a simple u8 type and the logic to
> > increment and decrement es58x_device:opened_channel_cnt is simplified
> > accordingly fixing the bug reported in [1]. We do not check again for
> > ASSERST_RTNL() as this is already done by the callers.
> >
> > [1] https://lore.kernel.org/linux-can/20220201140351.GA2548@kili/T/#u
> > [2] https://elixir.bootlin.com/linux/v5.16/source/net/core/dev.c#L1463
> > [3] https://elixir.bootlin.com/linux/v5.16/source/net/core/dev.c#L1541
> >
> > Fixes: 8537257874e9 ("can: etas_es58x: add core support for ETAS ES58X
> > CAN USB interfaces")
> > Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> > Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
>
> Applied to can/testing.
>
> I you (or someone else) wants to increase their patch count feel free to
> convert the other USB CAN drivers from atomic_t to u8, too.

Actually, not so many drivers are impacted:

| $ grep -R atomic_t drivers/net/can/
| drivers/net/can/c_can/c_can.h:  atomic_t sie_pending;
| drivers/net/can/usb/esd_usb2.c: atomic_t active_tx_jobs;
| drivers/net/can/usb/ems_usb.c:  atomic_t active_tx_urbs;
| drivers/net/can/usb/gs_usb.c:   atomic_t active_tx_urbs;
| drivers/net/can/usb/gs_usb.c:   atomic_t active_channels;
| drivers/net/can/usb/mcba_usb.c: atomic_t free_ctx_cnt;
| drivers/net/can/usb/usb_8dev.c: atomic_t active_tx_urbs;
| drivers/net/can/usb/peak_usb/pcan_usb_core.h:   atomic_t active_tx_urbs;
| drivers/net/can/usb/etas_es58x/es58x_core.h:    atomic_t tx_urbs_idle_cnt;
| drivers/net/can/usb/etas_es58x/es58x_core.c:    atomic_t *idle_cnt =
&es58x_dev->tx_urbs_idle_cnt;

The only relevant one seems to be the gs_usb with its atomic_t
active_channels. I looked at the code, the change to u8 shouldn’t
be too hard. But aside from that, I am also concerned by the
absence of an exit path in gs_can_open() to free the allocated
URB memory when an error occurs.

I will send a patch to change the active_channels from
atomic_t to u8, however, I will not rework the error path to
free the allocated URB memory.

Also, we need to double check that none of the drivers uses a
spinlock or mutex in their open() or close() functions. I gave it
a first glance and didn't find anything outstanding but I will
need to spend a bit of extra time on that to confirm.


Yours sincerely,
Vincent Mailhol
diff mbox series

Patch

diff --git a/drivers/net/can/usb/etas_es58x/es58x_core.c b/drivers/net/can/usb/etas_es58x/es58x_core.c
index 2ed2370a3166..2d73ebbf3836 100644
--- a/drivers/net/can/usb/etas_es58x/es58x_core.c
+++ b/drivers/net/can/usb/etas_es58x/es58x_core.c
@@ -1787,7 +1787,7 @@  static int es58x_open(struct net_device *netdev)
 	struct es58x_device *es58x_dev = es58x_priv(netdev)->es58x_dev;
 	int ret;
 
-	if (atomic_inc_return(&es58x_dev->opened_channel_cnt) == 1) {
+	if (!es58x_dev->opened_channel_cnt) {
 		ret = es58x_alloc_rx_urbs(es58x_dev);
 		if (ret)
 			return ret;
@@ -1805,12 +1805,13 @@  static int es58x_open(struct net_device *netdev)
 	if (ret)
 		goto free_urbs;
 
+	es58x_dev->opened_channel_cnt++;
 	netif_start_queue(netdev);
 
 	return ret;
 
  free_urbs:
-	if (atomic_dec_and_test(&es58x_dev->opened_channel_cnt))
+	if (!es58x_dev->opened_channel_cnt)
 		es58x_free_urbs(es58x_dev);
 	netdev_err(netdev, "%s: Could not open the network device: %pe\n",
 		   __func__, ERR_PTR(ret));
@@ -1845,7 +1846,8 @@  static int es58x_stop(struct net_device *netdev)
 
 	es58x_flush_pending_tx_msg(netdev);
 
-	if (atomic_dec_and_test(&es58x_dev->opened_channel_cnt))
+	es58x_dev->opened_channel_cnt--;
+	if (!es58x_dev->opened_channel_cnt)
 		es58x_free_urbs(es58x_dev);
 
 	return 0;
@@ -2215,7 +2217,6 @@  static struct es58x_device *es58x_init_es58x_dev(struct usb_interface *intf,
 	init_usb_anchor(&es58x_dev->tx_urbs_idle);
 	init_usb_anchor(&es58x_dev->tx_urbs_busy);
 	atomic_set(&es58x_dev->tx_urbs_idle_cnt, 0);
-	atomic_set(&es58x_dev->opened_channel_cnt, 0);
 	usb_set_intfdata(intf, es58x_dev);
 
 	es58x_dev->rx_pipe = usb_rcvbulkpipe(es58x_dev->udev,
diff --git a/drivers/net/can/usb/etas_es58x/es58x_core.h b/drivers/net/can/usb/etas_es58x/es58x_core.h
index 826a15871573..e5033cb5e695 100644
--- a/drivers/net/can/usb/etas_es58x/es58x_core.h
+++ b/drivers/net/can/usb/etas_es58x/es58x_core.h
@@ -373,8 +373,6 @@  struct es58x_operators {
  *	queue wake/stop logic should prevent this URB from getting
  *	empty. Please refer to es58x_get_tx_urb() for more details.
  * @tx_urbs_idle_cnt: number of urbs in @tx_urbs_idle.
- * @opened_channel_cnt: number of channels opened (c.f. es58x_open()
- *	and es58x_stop()).
  * @ktime_req_ns: kernel timestamp when es58x_set_realtime_diff_ns()
  *	was called.
  * @realtime_diff_ns: difference in nanoseconds between the clocks of
@@ -384,6 +382,10 @@  struct es58x_operators {
  *	in RX branches.
  * @rx_max_packet_size: Maximum length of bulk-in URB.
  * @num_can_ch: Number of CAN channel (i.e. number of elements of @netdev).
+ * @opened_channel_cnt: number of channels opened. Free of race
+ *	conditions because its two users (net_device_ops:ndo_open()
+ *	and net_device_ops:ndo_close()) guarantee that the network
+ *	stack big kernel lock (a.k.a. rtnl_mutex) is being hold.
  * @rx_cmd_buf_len: Length of @rx_cmd_buf.
  * @rx_cmd_buf: The device might split the URB commands in an
  *	arbitrary amount of pieces. This buffer is used to concatenate
@@ -406,7 +408,6 @@  struct es58x_device {
 	struct usb_anchor tx_urbs_busy;
 	struct usb_anchor tx_urbs_idle;
 	atomic_t tx_urbs_idle_cnt;
-	atomic_t opened_channel_cnt;
 
 	u64 ktime_req_ns;
 	s64 realtime_diff_ns;
@@ -415,6 +416,7 @@  struct es58x_device {
 
 	u16 rx_max_packet_size;
 	u8 num_can_ch;
+	u8 opened_channel_cnt;
 
 	u16 rx_cmd_buf_len;
 	union es58x_urb_cmd rx_cmd_buf;