Message ID | 20220627135501.713980-1-jiri@resnulli.us (mailing list archive) |
---|---|
Headers | show |
Series | net: devlink: remove devlink big lock | expand |
On Mon, Jun 27, 2022 at 03:54:59PM +0200, Jiri Pirko wrote: > From: Jiri Pirko <jiri@nvidia.com> > > This is an attempt to remove use of devlink_mutex. This is a global lock > taken for every user command. That causes that long operations performed > on one devlink instance (like flash update) are blocking other > operations on different instances. This patchset is supposed to prevent one devlink instance from blocking another? Devlink does not enable "parallel_ops", which means that the generic netlink mutex is serializing all user space operations. AFAICT, this series does not enable "parallel_ops", so I'm not sure what difference the removal of the devlink mutex makes. The devlink mutex (in accordance with the comment above it) serializes all user space operations and accesses to the devlink devices list. This resulted in a AA deadlock in the previous submission because we had a flow where a user space operation (which acquires this mutex) also tries to register / unregister a nested devlink instance which also tries to acquire the mutex. As long as devlink does not implement "parallel_ops", it seems that the devlink mutex can be reduced to only serializing accesses to the devlink devices list, thereby eliminating the deadlock. > > The first patch makes sure that the xarray that holds devlink pointers > is possible to be safely iterated. > > The second patch moves the user command mutex to be per-devlink. > > Jiri Pirko (2): > net: devlink: make sure that devlink_try_get() works with valid > pointer during xarray iteration > net: devlink: replace devlink_mutex by per-devlink lock > > net/core/devlink.c | 256 ++++++++++++++++++++++++++++----------------- > 1 file changed, 161 insertions(+), 95 deletions(-) > > -- > 2.35.3 >
Mon, Jun 27, 2022 at 05:41:31PM CEST, idosch@nvidia.com wrote: >On Mon, Jun 27, 2022 at 03:54:59PM +0200, Jiri Pirko wrote: >> From: Jiri Pirko <jiri@nvidia.com> >> >> This is an attempt to remove use of devlink_mutex. This is a global lock >> taken for every user command. That causes that long operations performed >> on one devlink instance (like flash update) are blocking other >> operations on different instances. > >This patchset is supposed to prevent one devlink instance from blocking >another? Devlink does not enable "parallel_ops", which means that the >generic netlink mutex is serializing all user space operations. AFAICT, >this series does not enable "parallel_ops", so I'm not sure what >difference the removal of the devlink mutex makes. You are correct, that is missing. For me, as a side effect this patchset resolved the deadlock for LC auxdev you pointed out. That was my motivation for this patchset :) > >The devlink mutex (in accordance with the comment above it) serializes >all user space operations and accesses to the devlink devices list. This >resulted in a AA deadlock in the previous submission because we had a >flow where a user space operation (which acquires this mutex) also tries >to register / unregister a nested devlink instance which also tries to >acquire the mutex. > >As long as devlink does not implement "parallel_ops", it seems that the >devlink mutex can be reduced to only serializing accesses to the devlink >devices list, thereby eliminating the deadlock. > >> >> The first patch makes sure that the xarray that holds devlink pointers >> is possible to be safely iterated. >> >> The second patch moves the user command mutex to be per-devlink. >> >> Jiri Pirko (2): >> net: devlink: make sure that devlink_try_get() works with valid >> pointer during xarray iteration >> net: devlink: replace devlink_mutex by per-devlink lock >> >> net/core/devlink.c | 256 ++++++++++++++++++++++++++++----------------- >> 1 file changed, 161 insertions(+), 95 deletions(-) >> >> -- >> 2.35.3 >>
On Mon, 27 Jun 2022 18:41:31 +0300 Ido Schimmel wrote: > On Mon, Jun 27, 2022 at 03:54:59PM +0200, Jiri Pirko wrote: > > This is an attempt to remove use of devlink_mutex. This is a global lock > > taken for every user command. That causes that long operations performed > > on one devlink instance (like flash update) are blocking other > > operations on different instances. > > This patchset is supposed to prevent one devlink instance from blocking > another? Devlink does not enable "parallel_ops", which means that the > generic netlink mutex is serializing all user space operations. AFAICT, > this series does not enable "parallel_ops", so I'm not sure what > difference the removal of the devlink mutex makes. > > The devlink mutex (in accordance with the comment above it) serializes > all user space operations and accesses to the devlink devices list. This > resulted in a AA deadlock in the previous submission because we had a > flow where a user space operation (which acquires this mutex) also tries > to register / unregister a nested devlink instance which also tries to > acquire the mutex. > > As long as devlink does not implement "parallel_ops", it seems that the > devlink mutex can be reduced to only serializing accesses to the devlink > devices list, thereby eliminating the deadlock. I'm unclear on why we can't wait for mlx5 locking rework which will allow us to move completely to per-instance locks. Do you have extra insights into how that work is progressing? I was hoping that it will be complete in the next two months.
Mon, Jun 27, 2022 at 07:49:45PM CEST, kuba@kernel.org wrote: >On Mon, 27 Jun 2022 18:41:31 +0300 Ido Schimmel wrote: >> On Mon, Jun 27, 2022 at 03:54:59PM +0200, Jiri Pirko wrote: >> > This is an attempt to remove use of devlink_mutex. This is a global lock >> > taken for every user command. That causes that long operations performed >> > on one devlink instance (like flash update) are blocking other >> > operations on different instances. >> >> This patchset is supposed to prevent one devlink instance from blocking >> another? Devlink does not enable "parallel_ops", which means that the >> generic netlink mutex is serializing all user space operations. AFAICT, >> this series does not enable "parallel_ops", so I'm not sure what >> difference the removal of the devlink mutex makes. >> >> The devlink mutex (in accordance with the comment above it) serializes >> all user space operations and accesses to the devlink devices list. This >> resulted in a AA deadlock in the previous submission because we had a >> flow where a user space operation (which acquires this mutex) also tries >> to register / unregister a nested devlink instance which also tries to >> acquire the mutex. >> >> As long as devlink does not implement "parallel_ops", it seems that the >> devlink mutex can be reduced to only serializing accesses to the devlink >> devices list, thereby eliminating the deadlock. > >I'm unclear on why we can't wait for mlx5 locking rework which will Sure we can, no rush. >allow us to move completely to per-instance locks. Do you have extra >insights into how that work is progressing? I was hoping that it will It's under internal review afaik. >be complete in the next two months. What do you mean exactly? Is that that we would be okay just with devlink->lock? I don't think so. We need user lock because we can't take devlink->lock for port split and reload. devlink_mutex protects that now, the devlink->cmd_lock I'm introducing here just replaces devlink_mutex. If we can do without, that is fine. I just can't see how. Also, I don't see the relation to mlx5 work. What is that?
Tue, Jun 28, 2022 at 08:32:49AM CEST, jiri@resnulli.us wrote: >Mon, Jun 27, 2022 at 07:49:45PM CEST, kuba@kernel.org wrote: >>On Mon, 27 Jun 2022 18:41:31 +0300 Ido Schimmel wrote: >>> On Mon, Jun 27, 2022 at 03:54:59PM +0200, Jiri Pirko wrote: >>> > This is an attempt to remove use of devlink_mutex. This is a global lock >>> > taken for every user command. That causes that long operations performed >>> > on one devlink instance (like flash update) are blocking other >>> > operations on different instances. >>> >>> This patchset is supposed to prevent one devlink instance from blocking >>> another? Devlink does not enable "parallel_ops", which means that the >>> generic netlink mutex is serializing all user space operations. AFAICT, >>> this series does not enable "parallel_ops", so I'm not sure what >>> difference the removal of the devlink mutex makes. >>> >>> The devlink mutex (in accordance with the comment above it) serializes >>> all user space operations and accesses to the devlink devices list. This >>> resulted in a AA deadlock in the previous submission because we had a >>> flow where a user space operation (which acquires this mutex) also tries >>> to register / unregister a nested devlink instance which also tries to >>> acquire the mutex. >>> >>> As long as devlink does not implement "parallel_ops", it seems that the >>> devlink mutex can be reduced to only serializing accesses to the devlink >>> devices list, thereby eliminating the deadlock. >> >>I'm unclear on why we can't wait for mlx5 locking rework which will > >Sure we can, no rush. > >>allow us to move completely to per-instance locks. Do you have extra >>insights into how that work is progressing? I was hoping that it will > >It's under internal review afaik. > >>be complete in the next two months. > >What do you mean exactly? Is that that we would be okay just with >devlink->lock? I don't think so. We need user lock because we can't take >devlink->lock for port split and reload. devlink_mutex protects that now, Okay, I take back port split, that is already fixed. Moshe is taking care of the reset (port_new/del, reporter_*). I will check out the reload. One we have that, you are correct, we are fine with devlink->lock instance lock. Thanks! >the devlink->cmd_lock I'm introducing here just replaces devlink_mutex. >If we can do without, that is fine. I just can't see how. >Also, I don't see the relation to mlx5 work. What is that?
On Mon, Jun 27, 2022 at 05:55:06PM +0200, Jiri Pirko wrote: > Mon, Jun 27, 2022 at 05:41:31PM CEST, idosch@nvidia.com wrote: > >On Mon, Jun 27, 2022 at 03:54:59PM +0200, Jiri Pirko wrote: > >> From: Jiri Pirko <jiri@nvidia.com> > >> > >> This is an attempt to remove use of devlink_mutex. This is a global lock > >> taken for every user command. That causes that long operations performed > >> on one devlink instance (like flash update) are blocking other > >> operations on different instances. > > > >This patchset is supposed to prevent one devlink instance from blocking > >another? Devlink does not enable "parallel_ops", which means that the > >generic netlink mutex is serializing all user space operations. AFAICT, > >this series does not enable "parallel_ops", so I'm not sure what > >difference the removal of the devlink mutex makes. > > You are correct, that is missing. For me, as a side effect this patchset > resolved the deadlock for LC auxdev you pointed out. That was my > motivation for this patchset :) Given that devlink does not enable "parallel_ops" and that the generic netlink mutex is held throughout all callbacks, what prevents you from simply dropping the devlink mutex now? IOW, why can't this series be patch #1 and another patch that removes the devlink mutex? > > > > > >The devlink mutex (in accordance with the comment above it) serializes > >all user space operations and accesses to the devlink devices list. This > >resulted in a AA deadlock in the previous submission because we had a > >flow where a user space operation (which acquires this mutex) also tries > >to register / unregister a nested devlink instance which also tries to > >acquire the mutex. > > > >As long as devlink does not implement "parallel_ops", it seems that the > >devlink mutex can be reduced to only serializing accesses to the devlink > >devices list, thereby eliminating the deadlock. > > > >> > >> The first patch makes sure that the xarray that holds devlink pointers > >> is possible to be safely iterated. > >> > >> The second patch moves the user command mutex to be per-devlink. > >> > >> Jiri Pirko (2): > >> net: devlink: make sure that devlink_try_get() works with valid > >> pointer during xarray iteration > >> net: devlink: replace devlink_mutex by per-devlink lock > >> > >> net/core/devlink.c | 256 ++++++++++++++++++++++++++++----------------- > >> 1 file changed, 161 insertions(+), 95 deletions(-) > >> > >> -- > >> 2.35.3 > >>
Tue, Jun 28, 2022 at 09:43:26AM CEST, idosch@nvidia.com wrote: >On Mon, Jun 27, 2022 at 05:55:06PM +0200, Jiri Pirko wrote: >> Mon, Jun 27, 2022 at 05:41:31PM CEST, idosch@nvidia.com wrote: >> >On Mon, Jun 27, 2022 at 03:54:59PM +0200, Jiri Pirko wrote: >> >> From: Jiri Pirko <jiri@nvidia.com> >> >> >> >> This is an attempt to remove use of devlink_mutex. This is a global lock >> >> taken for every user command. That causes that long operations performed >> >> on one devlink instance (like flash update) are blocking other >> >> operations on different instances. >> > >> >This patchset is supposed to prevent one devlink instance from blocking >> >another? Devlink does not enable "parallel_ops", which means that the >> >generic netlink mutex is serializing all user space operations. AFAICT, >> >this series does not enable "parallel_ops", so I'm not sure what >> >difference the removal of the devlink mutex makes. >> >> You are correct, that is missing. For me, as a side effect this patchset >> resolved the deadlock for LC auxdev you pointed out. That was my >> motivation for this patchset :) > >Given that devlink does not enable "parallel_ops" and that the generic >netlink mutex is held throughout all callbacks, what prevents you from >simply dropping the devlink mutex now? IOW, why can't this series be >patch #1 and another patch that removes the devlink mutex? Yep, I think you are correct. We are currently working with Moshe on conversion of commands that does not late devlink->lock (like health reporters and reload) to take devlink->lock. Once we have that, we can enable parallel_ops. > >> >> >> > >> >The devlink mutex (in accordance with the comment above it) serializes >> >all user space operations and accesses to the devlink devices list. This >> >resulted in a AA deadlock in the previous submission because we had a >> >flow where a user space operation (which acquires this mutex) also tries >> >to register / unregister a nested devlink instance which also tries to >> >acquire the mutex. >> > >> >As long as devlink does not implement "parallel_ops", it seems that the >> >devlink mutex can be reduced to only serializing accesses to the devlink >> >devices list, thereby eliminating the deadlock. >> > >> >> >> >> The first patch makes sure that the xarray that holds devlink pointers >> >> is possible to be safely iterated. >> >> >> >> The second patch moves the user command mutex to be per-devlink. >> >> >> >> Jiri Pirko (2): >> >> net: devlink: make sure that devlink_try_get() works with valid >> >> pointer during xarray iteration >> >> net: devlink: replace devlink_mutex by per-devlink lock >> >> >> >> net/core/devlink.c | 256 ++++++++++++++++++++++++++++----------------- >> >> 1 file changed, 161 insertions(+), 95 deletions(-) >> >> >> >> -- >> >> 2.35.3 >> >>
Wed, Jun 29, 2022 at 12:25:49PM CEST, jiri@resnulli.us wrote: >Tue, Jun 28, 2022 at 09:43:26AM CEST, idosch@nvidia.com wrote: >>On Mon, Jun 27, 2022 at 05:55:06PM +0200, Jiri Pirko wrote: >>> Mon, Jun 27, 2022 at 05:41:31PM CEST, idosch@nvidia.com wrote: >>> >On Mon, Jun 27, 2022 at 03:54:59PM +0200, Jiri Pirko wrote: >>> >> From: Jiri Pirko <jiri@nvidia.com> >>> >> >>> >> This is an attempt to remove use of devlink_mutex. This is a global lock >>> >> taken for every user command. That causes that long operations performed >>> >> on one devlink instance (like flash update) are blocking other >>> >> operations on different instances. >>> > >>> >This patchset is supposed to prevent one devlink instance from blocking >>> >another? Devlink does not enable "parallel_ops", which means that the >>> >generic netlink mutex is serializing all user space operations. AFAICT, >>> >this series does not enable "parallel_ops", so I'm not sure what >>> >difference the removal of the devlink mutex makes. >>> >>> You are correct, that is missing. For me, as a side effect this patchset >>> resolved the deadlock for LC auxdev you pointed out. That was my >>> motivation for this patchset :) >> >>Given that devlink does not enable "parallel_ops" and that the generic >>netlink mutex is held throughout all callbacks, what prevents you from >>simply dropping the devlink mutex now? IOW, why can't this series be >>patch #1 and another patch that removes the devlink mutex? > >Yep, I think you are correct. We are currently working with Moshe on Okay, I see the problem with what you suggested: devlink_pernet_pre_exit() There, devlink_mutex is taken to protect against simultaneous cmds from being executed. That will be fixed with reload conversion to take devlink->lock. >conversion of commands that does not late devlink->lock (like health >reporters and reload) to take devlink->lock. Once we have that, we can >enable parallel_ops. > >> >>> >>> >>> > >>> >The devlink mutex (in accordance with the comment above it) serializes >>> >all user space operations and accesses to the devlink devices list. This >>> >resulted in a AA deadlock in the previous submission because we had a >>> >flow where a user space operation (which acquires this mutex) also tries >>> >to register / unregister a nested devlink instance which also tries to >>> >acquire the mutex. >>> > >>> >As long as devlink does not implement "parallel_ops", it seems that the >>> >devlink mutex can be reduced to only serializing accesses to the devlink >>> >devices list, thereby eliminating the deadlock. >>> > >>> >> >>> >> The first patch makes sure that the xarray that holds devlink pointers >>> >> is possible to be safely iterated. >>> >> >>> >> The second patch moves the user command mutex to be per-devlink. >>> >> >>> >> Jiri Pirko (2): >>> >> net: devlink: make sure that devlink_try_get() works with valid >>> >> pointer during xarray iteration >>> >> net: devlink: replace devlink_mutex by per-devlink lock >>> >> >>> >> net/core/devlink.c | 256 ++++++++++++++++++++++++++++----------------- >>> >> 1 file changed, 161 insertions(+), 95 deletions(-) >>> >> >>> >> -- >>> >> 2.35.3 >>> >>
On Wed, Jun 29, 2022 at 12:36:24PM +0200, Jiri Pirko wrote: > Wed, Jun 29, 2022 at 12:25:49PM CEST, jiri@resnulli.us wrote: > >Tue, Jun 28, 2022 at 09:43:26AM CEST, idosch@nvidia.com wrote: > >>On Mon, Jun 27, 2022 at 05:55:06PM +0200, Jiri Pirko wrote: > >>> Mon, Jun 27, 2022 at 05:41:31PM CEST, idosch@nvidia.com wrote: > >>> >On Mon, Jun 27, 2022 at 03:54:59PM +0200, Jiri Pirko wrote: > >>> >> From: Jiri Pirko <jiri@nvidia.com> > >>> >> > >>> >> This is an attempt to remove use of devlink_mutex. This is a global lock > >>> >> taken for every user command. That causes that long operations performed > >>> >> on one devlink instance (like flash update) are blocking other > >>> >> operations on different instances. > >>> > > >>> >This patchset is supposed to prevent one devlink instance from blocking > >>> >another? Devlink does not enable "parallel_ops", which means that the > >>> >generic netlink mutex is serializing all user space operations. AFAICT, > >>> >this series does not enable "parallel_ops", so I'm not sure what > >>> >difference the removal of the devlink mutex makes. > >>> > >>> You are correct, that is missing. For me, as a side effect this patchset > >>> resolved the deadlock for LC auxdev you pointed out. That was my > >>> motivation for this patchset :) > >> > >>Given that devlink does not enable "parallel_ops" and that the generic > >>netlink mutex is held throughout all callbacks, what prevents you from > >>simply dropping the devlink mutex now? IOW, why can't this series be > >>patch #1 and another patch that removes the devlink mutex? > > > >Yep, I think you are correct. We are currently working with Moshe on > > Okay, I see the problem with what you suggested: > devlink_pernet_pre_exit() > There, devlink_mutex is taken to protect against simultaneous cmds > from being executed. That will be fixed with reload conversion to take > devlink->lock. OK, so this lock does not actually protect against simultaneous user space operations (this is handled by the generic netlink mutex). Instead, it protects against user space operations during netns dismantle. IIUC, the current plan is: 1. Get the devlink->lock rework done. Devlink will hold the lock for every operation invocation and drivers will hold it while calling into devlink via devl_lock(). This means 'DEVLINK_NL_FLAG_NO_LOCK' is removed and the lock will also be taken in netns dismantle. 2. At this stage, the devlink mutex is only taken in devlink_register() / devlink_unregister() and some form of patch #1 will take care of that so that this mutex can be removed. 3. Enable "parallel_ops" ?
Wed, Jun 29, 2022 at 01:30:07PM CEST, idosch@nvidia.com wrote: >On Wed, Jun 29, 2022 at 12:36:24PM +0200, Jiri Pirko wrote: >> Wed, Jun 29, 2022 at 12:25:49PM CEST, jiri@resnulli.us wrote: >> >Tue, Jun 28, 2022 at 09:43:26AM CEST, idosch@nvidia.com wrote: >> >>On Mon, Jun 27, 2022 at 05:55:06PM +0200, Jiri Pirko wrote: >> >>> Mon, Jun 27, 2022 at 05:41:31PM CEST, idosch@nvidia.com wrote: >> >>> >On Mon, Jun 27, 2022 at 03:54:59PM +0200, Jiri Pirko wrote: >> >>> >> From: Jiri Pirko <jiri@nvidia.com> >> >>> >> >> >>> >> This is an attempt to remove use of devlink_mutex. This is a global lock >> >>> >> taken for every user command. That causes that long operations performed >> >>> >> on one devlink instance (like flash update) are blocking other >> >>> >> operations on different instances. >> >>> > >> >>> >This patchset is supposed to prevent one devlink instance from blocking >> >>> >another? Devlink does not enable "parallel_ops", which means that the >> >>> >generic netlink mutex is serializing all user space operations. AFAICT, >> >>> >this series does not enable "parallel_ops", so I'm not sure what >> >>> >difference the removal of the devlink mutex makes. >> >>> >> >>> You are correct, that is missing. For me, as a side effect this patchset >> >>> resolved the deadlock for LC auxdev you pointed out. That was my >> >>> motivation for this patchset :) >> >> >> >>Given that devlink does not enable "parallel_ops" and that the generic >> >>netlink mutex is held throughout all callbacks, what prevents you from >> >>simply dropping the devlink mutex now? IOW, why can't this series be >> >>patch #1 and another patch that removes the devlink mutex? >> > >> >Yep, I think you are correct. We are currently working with Moshe on >> >> Okay, I see the problem with what you suggested: >> devlink_pernet_pre_exit() >> There, devlink_mutex is taken to protect against simultaneous cmds >> from being executed. That will be fixed with reload conversion to take >> devlink->lock. > >OK, so this lock does not actually protect against simultaneous user >space operations (this is handled by the generic netlink mutex). >Instead, it protects against user space operations during netns >dismantle. > >IIUC, the current plan is: > >1. Get the devlink->lock rework done. Devlink will hold the lock for >every operation invocation and drivers will hold it while calling into >devlink via devl_lock(). > >This means 'DEVLINK_NL_FLAG_NO_LOCK' is removed and the lock will also >be taken in netns dismantle. > >2. At this stage, the devlink mutex is only taken in devlink_register() >/ devlink_unregister() and some form of patch #1 will take care of that >so that this mutex can be removed. > >3. Enable "parallel_ops" Yes, exactly. With devlink_mutex removal in between 2 and 3. > >?
From: Jiri Pirko <jiri@nvidia.com> This is an attempt to remove use of devlink_mutex. This is a global lock taken for every user command. That causes that long operations performed on one devlink instance (like flash update) are blocking other operations on different instances. The first patch makes sure that the xarray that holds devlink pointers is possible to be safely iterated. The second patch moves the user command mutex to be per-devlink. Jiri Pirko (2): net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration net: devlink: replace devlink_mutex by per-devlink lock net/core/devlink.c | 256 ++++++++++++++++++++++++++++----------------- 1 file changed, 161 insertions(+), 95 deletions(-)