Message ID | 20221203133159.94414-1-mailhol.vincent@wanadoo.fr (mailing list archive) |
---|---|
Headers | show |
Series | can: usb: remove all usb_set_intfdata(intf, NULL) in drivers' disconnect() | expand |
On 03.12.22 14:31, Vincent Mailhol wrote: > The core sets the usb_interface to NULL in [1]. Also setting it to > NULL in usb_driver::disconnects() is at best useless, at worse risky. Hi, I am afraid there is a major issue with your series of patches. The drivers you are removing this from often have a subsequent check for the data they got from usb_get_intfdata() being NULL. That pattern is taken from drivers like btusb or CDC-ACM, which claim secondary interfaces disconnect() will be called a second time for. In addition, a driver can use setting intfdata to NULL as a flag for disconnect() having proceeded to a point where certain things can no longer be safely done. You need to check for that in every driver you remove this code from and if you decide that it can safely be removed, which is likely, then please also remove checks like this: struct ems_usb *dev = usb_get_intfdata(intf); usb_set_intfdata(intf, NULL); if (dev) { unregister_netdev(dev->netdev); Either it can be called a second time, then you need to leave it as is, or the check for NULL is superfluous. But only removing setting the pointer to NULL never makes sense. Regards Oliver
On Mon. 5 Dec. 2022 at 17:39, Oliver Neukum <oneukum@suse.com> wrote: > On 03.12.22 14:31, Vincent Mailhol wrote: > > The core sets the usb_interface to NULL in [1]. Also setting it to > > NULL in usb_driver::disconnects() is at best useless, at worse risky. > > Hi, > > I am afraid there is a major issue with your series of patches. > The drivers you are removing this from often have a subsequent check > for the data they got from usb_get_intfdata() being NULL. ACK, but I do not see the connection. > That pattern is taken from drivers like btusb or CDC-ACM Where does CDC-ACM set *his* interface to NULL? Looking at: https://elixir.bootlin.com/linux/v6.0/source/drivers/usb/class/cdc-acm.c#L1531 I can see that cdc-acm sets acm->control and acm->data to NULL in his disconnect(), but it doesn't set its own usb_interface to NULL. > which claim secondary interfaces disconnect() will be called a second time > for. Are you saying that the disconnect() of those CAN USB drivers is being called twice? I do not see this in the source code. The only caller of usb_driver::disconnect() I can see is: https://elixir.bootlin.com/linux/v6.0/source/drivers/usb/core/driver.c#L458 > In addition, a driver can use setting intfdata to NULL as a flag > for disconnect() having proceeded to a point where certain things > can no longer be safely done. Any reference that a driver can do that? This pattern seems racy. By the way, I did check all the drivers: * ems_usb: intf is only used in ems_usb_probe() and ems_usb_disconnect() functions. * esd_usb: intf is only used in the esd_usb_probe(), esd_usb_probe_one_net() (which is part of probing), esd_usb_disconnect() and a couple of sysfs functions (which only use intf to get a pointer to struct esd_usb). * gs_usb: intf is used several time but only to retrive struct usb_device. This seems useless, I will sent this patch to remove it: https://lore.kernel.org/linux-can/20221208081142.16936-3-mailhol.vincent@wanadoo.fr/ Aside of that, intf is only used in gs_usb_probe(), gs_make_candev() (which is part of probing) and gs_usb_disconnect() functions. * kvaser_usb: intf is only used in kvaser_usb_probe() and kvaser_usb_disconnect() functions. * mcba_usb: intf is only used in mcba_usb_probe() and mcba_usb_disconnect() functions. * ucan: intf is only used in ucan_probe() and ucan_disconnect(). struct ucan_priv also has a pointer to intf but it is never used. I sent this patch to remove it: https://lore.kernel.org/linux-can/20221208081142.16936-2-mailhol.vincent@wanadoo.fr/ * usb_8dev: intf is only used in usb_8dev_probe() and usb_8dev_disconnect(). With no significant use of intf outside of the probe() and disconnect(), there is definitely no such "use intf as a flag" in any of these drivers. > You need to check for that in every driver > you remove this code from and if you decide that it can safely be removed, What makes you assume that I didn't check this in the first place? Or do you see something I missed? > which is likely, then please also remove checks like this: > > struct ems_usb *dev = usb_get_intfdata(intf); > > usb_set_intfdata(intf, NULL); > > if (dev) { > unregister_netdev(dev->netdev); How is the if (dev) check related? There is no correlation between setting intf to NULL and dev not being NULL. I think dev is never NULL, but I did not assess that dev could not be NULL. > Either it can be called a second time, then you need to leave it > as is, Really?! The first thing disconnect() does is calling usb_get_intfdata(intf) which dereferences intf without checking if it is NULL, c.f.: https://elixir.bootlin.com/linux/v6.0/source/include/linux/usb.h#L265 Then it sets intf to NULL. The second time you call disconnect(), the usb_get_intfdata(intf) would be a NULL pointer dereference. > or the check for NULL is superfluous. But only removing setting > the pointer to NULL never makes sense. Yours sincerely, Vincent Mailhol
On 08.12.22 10:00, Vincent MAILHOL wrote: > On Mon. 5 Dec. 2022 at 17:39, Oliver Neukum <oneukum@suse.com> wrote: >> On 03.12.22 14:31, Vincent Mailhol wrote: Good Morning! > ACK, but I do not see the connection. Well, useless checks are bad. In particular, we should always make it clear whether a pointer may or may not be NULL. That is, I have no problem with what you were trying to do with your patch set. It is a good idea and possibly slightly overdue. The problem is the method. > I can see that cdc-acm sets acm->control and acm->data to NULL in his > disconnect(), but it doesn't set its own usb_interface to NULL. You don't have to, but you can. I was explaining the two patterns for doing so. >> which claim secondary interfaces disconnect() will be called a second time >> for. > > Are you saying that the disconnect() of those CAN USB drivers is being > called twice? I do not see this in the source code. The only caller of > usb_driver::disconnect() I can see is: > > https://elixir.bootlin.com/linux/v6.0/source/drivers/usb/core/driver.c#L458 If they use usb_claim_interface(), yes it is called twice. Once per interface. That is in the case of ACM once for the originally probed interface and a second time for the claimed interface. But not necessarily in that order, as you can be kicked off an interface via sysfs. Yet you need to cease operations as soon as you are disconnected from any interface. That is annoying because it means you cannot use a refcount. From that stems the widespread use of intfdata as a flag. >> In addition, a driver can use setting intfdata to NULL as a flag >> for disconnect() having proceeded to a point where certain things >> can no longer be safely done. > > Any reference that a driver can do that? This pattern seems racy. Technically that is exactly what drivers that use usb_claim_interface() do. You free everything at the first call and use intfdata as a flag to prevent a double free. The race is prevented by usbcore locking, which guarantees that probe() and disconnect() have mutual exclusion. If you use intfdata in sysfs, yes additional locking is needed. > What makes you assume that I didn't check this in the first place? Or > do you see something I missed? That you did not put it into the changelogs. That reads like the drivers are doing something obsolete or stupid. They do not. They copied something that is necessary only under some circumstances. And that you did not remove the checks. >> which is likely, then please also remove checks like this: >> >> struct ems_usb *dev = usb_get_intfdata(intf); >> >> usb_set_intfdata(intf, NULL); >> >> if (dev) { Here. If you have a driver that uses usb_claim_interface(). You need this check or you unregister an already unregistered netdev. The way this disconnect() method is coded is extremely defensive. Most drivers do not need this check. But it is never wrong in the strict sense. Hence doing a mass removal with a change log that does not say that this driver is using only a single interface hence the check can be dropped to reduce code size is not good. Regards Oliver
On Thu. 8 Dec. 2022 at 20:04, Oliver Neukum <oneukum@suse.com> wrote: > On 08.12.22 10:00, Vincent MAILHOL wrote: > > On Mon. 5 Dec. 2022 at 17:39, Oliver Neukum <oneukum@suse.com> wrote: > >> On 03.12.22 14:31, Vincent Mailhol wrote: > > Good Morning! Good night! (different time zone :)) > > ACK, but I do not see the connection. > Well, useless checks are bad. In particular, we should always > make it clear whether a pointer may or may not be NULL. > That is, I have no problem with what you were trying to do > with your patch set. It is a good idea and possibly slightly > overdue. The problem is the method. > > > I can see that cdc-acm sets acm->control and acm->data to NULL in his > > disconnect(), but it doesn't set its own usb_interface to NULL. > > You don't have to, but you can. I was explaining the two patterns for doing so. > > >> which claim secondary interfaces disconnect() will be called a second time > >> for. > > > > Are you saying that the disconnect() of those CAN USB drivers is being > > called twice? I do not see this in the source code. The only caller of > > usb_driver::disconnect() I can see is: > > > > https://elixir.bootlin.com/linux/v6.0/source/drivers/usb/core/driver.c#L458 > > If they use usb_claim_interface(), yes it is called twice. Once per > interface. That is in the case of ACM once for the originally probed > interface and a second time for the claimed interface. > But not necessarily in that order, as you can be kicked off an interface > via sysfs. Yet you need to cease operations as soon as you are disconnected > from any interface. That is annoying because it means you cannot use a > refcount. From that stems the widespread use of intfdata as a flag. Thank you for the details! I better understand this part now. > >> In addition, a driver can use setting intfdata to NULL as a flag > >> for disconnect() having proceeded to a point where certain things > >> can no longer be safely done. > > > > Any reference that a driver can do that? This pattern seems racy. > > Technically that is exactly what drivers that use usb_claim_interface() > do. You free everything at the first call and use intfdata as a flag > to prevent a double free. > The race is prevented by usbcore locking, which guarantees that probe() > and disconnect() have mutual exclusion. > If you use intfdata in sysfs, yes additional locking is needed. ACK for the mutual exclusion. My question was about what you said in your previous message: | In addition, a driver can use setting intfdata to NULL as a flag | for *disconnect() having proceeded to a point* where certain things | can no longer be safely done. How do you check that disconnect() has proceeded *to a given point* using intf without being racy? You can check if it has already completed once but not check how far it has proceeded, right? > > What makes you assume that I didn't check this in the first place? Or > > do you see something I missed? > > That you did not put it into the changelogs. > That reads like the drivers are doing something obsolete or stupid. > They do not. They copied something that is necessary only under > some circumstances. > > And that you did not remove the checks. > > >> which is likely, then please also remove checks like this: > >> > >> struct ems_usb *dev = usb_get_intfdata(intf); > >> > >> usb_set_intfdata(intf, NULL); > >> > >> if (dev) { > > Here. If you have a driver that uses usb_claim_interface(). > You need this check or you unregister an already unregistered > netdev. Sorry, but with all my best intentions, I still do not get it. During the second iteration, inft is NULL and: /* equivalent to dev = intf->dev.data. Because intf is NULL, * this is a NULL pointer dereference */ struct ems_usb *dev = usb_get_intfdata(intf); /* OK, intf is already NULL */ usb_set_intfdata(intf, NULL); /* follows a NULL pointer dereference so this is undefined * behaviour */ if (dev) { How is this a valid check that you entered the function for the second time? If intf is the flag, you should check intf, not dev? Something like this: struct ems_usb *dev; if (!intf) return; dev = usb_get_intfdata(intf); /* ... */ I just can not see the connection between intf being NULL and the if (dev) check. All I see is some undefined behaviour, sorry. > The way this disconnect() method is coded is extremely defensive. > Most drivers do not need this check. But it is never > wrong in the strict sense. > > Hence doing a mass removal with a change log that does > not say that this driver is using only a single interface > hence the check can be dropped to reduce code size > is not good. > > Regards > Oliver
On Fri, Dec 09, 2022 at 12:44:51AM +0900, Vincent MAILHOL wrote: > On Thu. 8 Dec. 2022 at 20:04, Oliver Neukum <oneukum@suse.com> wrote: > > >> which is likely, then please also remove checks like this: > > >> > > >> struct ems_usb *dev = usb_get_intfdata(intf); > > >> > > >> usb_set_intfdata(intf, NULL); > > >> > > >> if (dev) { > > > > Here. If you have a driver that uses usb_claim_interface(). > > You need this check or you unregister an already unregistered > > netdev. > > Sorry, but with all my best intentions, I still do not get it. During > the second iteration, inft is NULL and: No, intf is never NULL. Rather, the driver-specific pointer stored in intfdata may be NULL. You seem to be confusing intf with intfdata(intf). > /* equivalent to dev = intf->dev.data. Because intf is NULL, > * this is a NULL pointer dereference */ > struct ems_usb *dev = usb_get_intfdata(intf); So here dev will be NULL when the second interface's disconnect routine runs, because the first time through the routine sets the intfdata to NULL for both interfaces: USB core calls ->disconnect(intf1) disconnect routine sets intfdata(intf1) and intfdata(intf2) both to NULL and handles the disconnection USB core calls ->disconnect(intf2) disconnect routine sees that intfdata(intf2) is already NULL, so it knows that it doesn't need to do anything more. As you can see in this scenario, neither intf1 nor intf2 is ever NULL. > /* OK, intf is already NULL */ > usb_set_intfdata(intf, NULL); > > /* follows a NULL pointer dereference so this is undefined > * behaviour */ > if (dev) { > > How is this a valid check that you entered the function for the second > time? If intf is the flag, you should check intf, not dev? Something > like this: intf is not a flag; it is the argument to the function and is never NULL. The flag is the intfdata. > struct ems_usb *dev; > > if (!intf) > return; > > dev = usb_get_intfdata(intf); > /* ... */ > > I just can not see the connection between intf being NULL and the if > (dev) check. All I see is some undefined behaviour, sorry. Once you get it straightened out in your head, you will understand. Alan Stern
On 08.12.22 16:44, Vincent MAILHOL wrote: > On Thu. 8 Dec. 2022 at 20:04, Oliver Neukum <oneukum@suse.com> wrote: >> On 08.12.22 10:00, Vincent MAILHOL wrote: >>> On Mon. 5 Dec. 2022 at 17:39, Oliver Neukum <oneukum@suse.com> wrote: >>>> On 03.12.22 14:31, Vincent Mailhol wrote: >> >> Good Morning! > > Good night! (different time zone :)) Good evening! > > How do you check that disconnect() has proceeded *to a given point* > using intf without being racy? You can check if it has already > completed once but not check how far it has proceeded, right? You'd use intfdata, which is a pointer stored in intf. But other than that the simplest way would be to use a mutex. Regards Oliver
Hi, Thanks Alan and Oliver for your patience, really appreciated. And sorry that it took me four messages to realize my mistake. I will send a v2 right now.