Message ID | 20241106063542.357743-1-houtao@huaweicloud.com (mailing list archive) |
---|---|
Headers | show |
Series | Fix lockdep warning for htab of map | expand |
On 2024-11-06 14:35:39 [+0800], Hou Tao wrote: > From: Hou Tao <houtao1@huawei.com> > > Hi, Hi Hou, > The patch set fixes a lockdep warning for htab of map. The > warning is found when running test_maps. The warning occurs when > htab_put_fd_value() attempts to acquire map_idr_lock to free the map id > of the inner map while already holding the bucket lock (raw_spinlock_t). > > The fix moves the invocation of free_htab_elem() after > htab_unlock_bucket() and adds a test case to verify the solution. Please > see the individual patches for details. Comments are always welcome. Thank you. Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> I've seen that you didn't move check_and_free_fields() out of the bucket locked section. Type BPF_TIMER does hrtimer_cancel() if the timer happens to run on a remote CPU. On PREEMPT_RT this will acquire a sleeping lock which is problematic due to the raw_spinlock_t. Would it be okay, to cleanup the timer unconditionally via the workqueue? Sebastian
Hi, On 11/6/2024 4:45 PM, Sebastian Andrzej Siewior wrote: > On 2024-11-06 14:35:39 [+0800], Hou Tao wrote: >> From: Hou Tao <houtao1@huawei.com> >> >> Hi, > Hi Hou, > >> The patch set fixes a lockdep warning for htab of map. The >> warning is found when running test_maps. The warning occurs when >> htab_put_fd_value() attempts to acquire map_idr_lock to free the map id >> of the inner map while already holding the bucket lock (raw_spinlock_t). >> >> The fix moves the invocation of free_htab_elem() after >> htab_unlock_bucket() and adds a test case to verify the solution. Please >> see the individual patches for details. Comments are always welcome. > Thank you. > > Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > I've seen that you didn't move check_and_free_fields() out of the bucket > locked section. Type BPF_TIMER does hrtimer_cancel() if the timer > happens to run on a remote CPU. On PREEMPT_RT this will acquire a > sleeping lock which is problematic due to the raw_spinlock_t. > Would it be okay, to cleanup the timer unconditionally via the > workqueue? Yes. The patch set still invokes check_and_free_fields() under the bucket lock when updating an existing element in a pre-allocated htab. I missed the hrtimer case. For the sleeping lock, you mean the cpu_base->softirq_expiry_lock in hrtimer_cancel_waiting_running(), right ? Instead of cancelling the timer in workqueue, maybe we could save the old value temporarily in the bucket lock, and try to free it outside of the bucket lock or disabling the extra_elems logic temporarily for the case ? > Sebastian
On 2024-11-06 17:48:59 [+0800], Hou Tao wrote: > Hi, Hi, > Yes. The patch set still invokes check_and_free_fields() under the > bucket lock when updating an existing element in a pre-allocated htab. I > missed the hrtimer case. For the sleeping lock, you mean the > cpu_base->softirq_expiry_lock in hrtimer_cancel_waiting_running(), right Yes. > ? Instead of cancelling the timer in workqueue, maybe we could save the > old value temporarily in the bucket lock, and try to free it outside of > the bucket lock or disabling the extra_elems logic temporarily for the > case ? Well, it is up to you. Either: diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 1a43d06eab286..b077af12fc9b4 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1593,10 +1593,7 @@ void bpf_timer_cancel_and_free(void *val) * To avoid these issues, punt to workqueue context when we are in a * timer callback. */ - if (this_cpu_read(hrtimer_running)) - queue_work(system_unbound_wq, &t->cb.delete_work); - else - bpf_timer_delete_work(&t->cb.delete_work); + queue_work(system_unbound_wq, &t->cb.delete_work); } /* This function is called by map_delete/update_elem for individual element and Or something smarter where you cancel the timer outside of the bucket lock. Sebastian
Hello: This series was applied to bpf/bpf-next.git (master) by Alexei Starovoitov <ast@kernel.org>: On Wed, 6 Nov 2024 14:35:39 +0800 you wrote: > From: Hou Tao <houtao1@huawei.com> > > Hi, > > The patch set fixes a lockdep warning for htab of map. The > warning is found when running test_maps. The warning occurs when > htab_put_fd_value() attempts to acquire map_idr_lock to free the map id > of the inner map while already holding the bucket lock (raw_spinlock_t). > > [...] Here is the summary with links: - [bpf-next,1/3] bpf: Call free_htab_elem() after htab_unlock_bucket() https://git.kernel.org/bpf/bpf-next/c/81eef03cbf70 - [bpf-next,2/3] selftests/bpf: Move ENOTSUPP from bpf_util.h https://git.kernel.org/bpf/bpf-next/c/7d5e83b4d41b - [bpf-next,3/3] selftests/bpf: Test the update operations for htab of maps https://git.kernel.org/bpf/bpf-next/c/ff605ec69735 You are awesome, thank you!
On Wed, Nov 6, 2024 at 1:49 AM Hou Tao <houtao@huaweicloud.com> wrote: > > Hi, > > On 11/6/2024 4:45 PM, Sebastian Andrzej Siewior wrote: > > On 2024-11-06 14:35:39 [+0800], Hou Tao wrote: > >> From: Hou Tao <houtao1@huawei.com> > >> > >> Hi, > > Hi Hou, > > > >> The patch set fixes a lockdep warning for htab of map. The > >> warning is found when running test_maps. The warning occurs when > >> htab_put_fd_value() attempts to acquire map_idr_lock to free the map id > >> of the inner map while already holding the bucket lock (raw_spinlock_t). > >> > >> The fix moves the invocation of free_htab_elem() after > >> htab_unlock_bucket() and adds a test case to verify the solution. Please > >> see the individual patches for details. Comments are always welcome. The fix makes sense. I manually resolved merge conflict and applied. > > Thank you. > > > > Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > > > I've seen that you didn't move check_and_free_fields() out of the bucket > > locked section. Type BPF_TIMER does hrtimer_cancel() if the timer > > happens to run on a remote CPU. On PREEMPT_RT this will acquire a > > sleeping lock which is problematic due to the raw_spinlock_t. > > Would it be okay, to cleanup the timer unconditionally via the > > workqueue? > > Yes. The patch set still invokes check_and_free_fields() under the > bucket lock when updating an existing element in a pre-allocated htab. I > missed the hrtimer case. For the sleeping lock, you mean the > cpu_base->softirq_expiry_lock in hrtimer_cancel_waiting_running(), right > ? Instead of cancelling the timer in workqueue, maybe we could save the > old value temporarily in the bucket lock, and try to free it outside of > the bucket lock or disabling the extra_elems logic temporarily for the > case ? We definitely need to avoid spamming wq when cancelling timers. wq may not be able to handle the volume. Moving it outside of bucket lock is certainly better.
Hi, On 11/9/2024 3:52 AM, Alexei Starovoitov wrote: > On Wed, Nov 6, 2024 at 1:49 AM Hou Tao <houtao@huaweicloud.com> wrote: >> Hi, >> >> On 11/6/2024 4:45 PM, Sebastian Andrzej Siewior wrote: >>> On 2024-11-06 14:35:39 [+0800], Hou Tao wrote: >>>> From: Hou Tao <houtao1@huawei.com> >>>> >>>> Hi, >>> Hi Hou, >>> >>>> The patch set fixes a lockdep warning for htab of map. The >>>> warning is found when running test_maps. The warning occurs when >>>> htab_put_fd_value() attempts to acquire map_idr_lock to free the map id >>>> of the inner map while already holding the bucket lock (raw_spinlock_t). >>>> >>>> The fix moves the invocation of free_htab_elem() after >>>> htab_unlock_bucket() and adds a test case to verify the solution. Please >>>> see the individual patches for details. Comments are always welcome. > The fix makes sense. > I manually resolved merge conflict and applied. Thanks for the manually conflict resolving. However, the patch set doesn't move all free operations outside of lock scope (e.g., for bpf_map_lookup_and_delete()), because htab of maps doesn't support it. I could post another patch set to do that. > >>> Thank you. >>> >>> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> >>> >>> I've seen that you didn't move check_and_free_fields() out of the bucket >>> locked section. Type BPF_TIMER does hrtimer_cancel() if the timer >>> happens to run on a remote CPU. On PREEMPT_RT this will acquire a >>> sleeping lock which is problematic due to the raw_spinlock_t. >>> Would it be okay, to cleanup the timer unconditionally via the >>> workqueue? >> Yes. The patch set still invokes check_and_free_fields() under the >> bucket lock when updating an existing element in a pre-allocated htab. I >> missed the hrtimer case. For the sleeping lock, you mean the >> cpu_base->softirq_expiry_lock in hrtimer_cancel_waiting_running(), right >> ? Instead of cancelling the timer in workqueue, maybe we could save the >> old value temporarily in the bucket lock, and try to free it outside of >> the bucket lock or disabling the extra_elems logic temporarily for the >> case ? > We definitely need to avoid spamming wq when cancelling timers. > wq may not be able to handle the volume. > Moving it outside of bucket lock is certainly better. > . OK. Will try whether there is a better way to handle the cancelling of timer case.
On Fri, Nov 8, 2024 at 5:48 PM Hou Tao <houtao@huaweicloud.com> wrote: > > Hi, > > On 11/9/2024 3:52 AM, Alexei Starovoitov wrote: > > On Wed, Nov 6, 2024 at 1:49 AM Hou Tao <houtao@huaweicloud.com> wrote: > >> Hi, > >> > >> On 11/6/2024 4:45 PM, Sebastian Andrzej Siewior wrote: > >>> On 2024-11-06 14:35:39 [+0800], Hou Tao wrote: > >>>> From: Hou Tao <houtao1@huawei.com> > >>>> > >>>> Hi, > >>> Hi Hou, > >>> > >>>> The patch set fixes a lockdep warning for htab of map. The > >>>> warning is found when running test_maps. The warning occurs when > >>>> htab_put_fd_value() attempts to acquire map_idr_lock to free the map id > >>>> of the inner map while already holding the bucket lock (raw_spinlock_t). > >>>> > >>>> The fix moves the invocation of free_htab_elem() after > >>>> htab_unlock_bucket() and adds a test case to verify the solution. Please > >>>> see the individual patches for details. Comments are always welcome. > > The fix makes sense. > > I manually resolved merge conflict and applied. > > Thanks for the manually conflict resolving. However, the patch set > doesn't move all free operations outside of lock scope (e.g., for > bpf_map_lookup_and_delete()), because htab of maps doesn't support it. I > could post another patch set to do that. Pls do. It's easier to review incrementally. For the last year we've been working on "resilient spin lock" and hope to post patches in a couple weeks. In those patches we will be replacing htab bucket lock with this new lock, so moving other locks out of htab lock region fits very well.
From: Hou Tao <houtao1@huawei.com> Hi, The patch set fixes a lockdep warning for htab of map. The warning is found when running test_maps. The warning occurs when htab_put_fd_value() attempts to acquire map_idr_lock to free the map id of the inner map while already holding the bucket lock (raw_spinlock_t). The fix moves the invocation of free_htab_elem() after htab_unlock_bucket() and adds a test case to verify the solution. Please see the individual patches for details. Comments are always welcome. Hou Tao (3): bpf: Call free_htab_elem() after htab_unlock_bucket() selftests/bpf: Move ENOTSUPP from bpf_util.h selftests/bpf: Test the update operations for htab of maps kernel/bpf/hashtab.c | 56 +++++--- tools/testing/selftests/bpf/bpf_util.h | 4 + .../selftests/bpf/prog_tests/bpf_tcp_ca.c | 4 - .../selftests/bpf/prog_tests/lsm_cgroup.c | 4 - .../selftests/bpf/prog_tests/map_in_map.c | 132 +++++++++++++++++- .../selftests/bpf/prog_tests/sock_addr.c | 4 - .../selftests/bpf/progs/update_map_in_htab.c | 30 ++++ tools/testing/selftests/bpf/test_maps.c | 4 - tools/testing/selftests/bpf/test_verifier.c | 4 - 9 files changed, 204 insertions(+), 38 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/update_map_in_htab.c