Message ID | 20231018154804.420823-1-atenart@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | net-sysfs: remove rtnl_trylock/restart_syscall use | expand |
On Wed, 18 Oct 2023 17:47:42 +0200 Antoine Tenart <atenart@kernel.org> wrote: > Hi, > > This is sent as an RFC because I believe this should be discussed (and > some might want to do additional testing), but the code itself is ready. > > Some time ago we tried to improve the rtnl_trylock/restart_syscall > situation[1]. What happens is when there is rtnl contention, userspace > accessing net sysfs attributes will spin and experience delays. This can > happen in different situations, when sysfs attributes are accessed > (networking daemon, configuration, monitoring) while operations under > rtnl are performed (veth creation, driver configuration, etc). A few > improvements can be done in userspace to ease things, like using the > netlink interface instead, or polling less (or more selectively) the > attributes; but in the end the root cause is always there and this keeps > happening from time to time. > > That initial effort however wasn't successful, although I think there > was an interest, mostly because we found technical flaws and didn't find > a working solution at the time. Some time later, we gave it a new try > and found something more promising, but the patches fell off my radar. I > recently had another look at this series, made more tests and cleaned it > up. > > The technical aspect is described in patch 1 directly in the code > comments, with an additional important comment in patch 3. This was > mostly tested by stress-testing net sysfs attributes (read/write ops) > while adding/removing queues and adding/removing veths, all in parallel. > > All comments are welcomed. The trylock was introduced to deal with lock inversion. It is not clear how this more complex solution prevents that.
Quoting Stephen Hemminger (2023-10-18 17:57:17) > The trylock was introduced to deal with lock inversion. > It is not clear how this more complex solution prevents that. Anything specifically in the patch 1 comments is not clear that I can improve? The dead lock happens between rtnl_lock and the refcounting on the attribute kn->active, specifically when unregistering net devices because device_del kernfs_drain will wait for the kn->active refcount to reach KN_DEACTIVATED_BIAS, under an rtnl section. The current solution was making one path to bail out (trylock/restart syscall). The idea here is we can actually bail out of the attribute kn protection (kn->active), while still letting unregistering net devices to wait for the current sysfs operations to complete, by using the net device refcount instead. To simplify, instead of waiting on kn->active in the net device unregistration step, this waits on the net device refcount (netdev_wait_allrefs_any), which is done outside an rtnl section. This way kernfs_drain can complete under its rtnl section even if a call to rtnl_lock is waiting in a sysfs operation. Antoine
On Wed, 18 Oct 2023 17:47:42 +0200 Antoine Tenart <atenart@kernel.org> wrote: > Some time ago we tried to improve the rtnl_trylock/restart_syscall > situation[1]. What happens is when there is rtnl contention, userspace > accessing net sysfs attributes will spin and experience delays. This can > happen in different situations, when sysfs attributes are accessed > (networking daemon, configuration, monitoring) while operations under > rtnl are performed (veth creation, driver configuration, etc). A few > improvements can be done in userspace to ease things, like using the > netlink interface instead, or polling less (or more selectively) the > attributes; but in the end the root cause is always there and this keeps > happening from time to time. What attribute is not exposed by netlink, and only by sysfs? There will always be more overhead using sysfs. That doesn't mean the locking should not be fixed, just that better to avoid the situation if possible.
Quoting Stephen Hemminger (2023-10-18 20:15:47) > On Wed, 18 Oct 2023 17:47:42 +0200 > Antoine Tenart <atenart@kernel.org> wrote: > > > Some time ago we tried to improve the rtnl_trylock/restart_syscall > > situation[1]. What happens is when there is rtnl contention, userspace > > accessing net sysfs attributes will spin and experience delays. This can > > happen in different situations, when sysfs attributes are accessed > > (networking daemon, configuration, monitoring) while operations under > > rtnl are performed (veth creation, driver configuration, etc). A few > > improvements can be done in userspace to ease things, like using the > > netlink interface instead, or polling less (or more selectively) the > > attributes; but in the end the root cause is always there and this keeps > > happening from time to time. > > What attribute is not exposed by netlink, and only by sysfs? I think there were a few last time (a while a go) I checked, but it's not an issue IMHO, if one is missing in netlink we can add it. > That doesn't mean the locking should not be fixed, just that better > to avoid the situation if possible. 100% agree on this one. I believe using netlink is the right way. Having said that, sysfs is still there and (quite some time ago) while having discussions with different projects, some were keen to switch to netlink, but some weren't and pushed back because "sysfs is a stable API" and "if there is a kernel issue it should be fixed in the kernel". Not blaming anyone really, they'd have to support the two interfaces for compatibility. My point is, yes, I would encourage everyone to use netlink too, but we don't control every user and it's not like sysfs will disappear anytime soon. Thanks, Antoine
On Thu, 19 Oct 2023 09:47:27 +0200 Antoine Tenart <atenart@kernel.org> wrote: > > That doesn't mean the locking should not be fixed, just that better > > to avoid the situation if possible. > > 100% agree on this one. I believe using netlink is the right way. > > Having said that, sysfs is still there and (quite some time ago) while > having discussions with different projects, some were keen to switch to > netlink, but some weren't and pushed back because "sysfs is a stable > API" and "if there is a kernel issue it should be fixed in the kernel". > Not blaming anyone really, they'd have to support the two interfaces for > compatibility. My point is, yes, I would encourage everyone to use > netlink too, but we don't control every user and it's not like sysfs > will disappear anytime soon. I have seen code doing discovery of new devices via netlink then poking around in sysfs. But that usage is inherently racy from the application point of view. By the time device is discovered, it might be removed or worse renamed before the sysfs operations.