mbox series

[bpf-next,0/3] Fix lockdep warning for htab of map

Message ID 20241106063542.357743-1-houtao@huaweicloud.com (mailing list archive)
Headers show
Series Fix lockdep warning for htab of map | expand

Message

Hou Tao Nov. 6, 2024, 6:35 a.m. UTC
From: Hou Tao <houtao1@huawei.com>

Hi,

The patch set fixes a lockdep warning for htab of map. The
warning is found when running test_maps. The warning occurs when
htab_put_fd_value() attempts to acquire map_idr_lock to free the map id
of the inner map while already holding the bucket lock (raw_spinlock_t).

The fix moves the invocation of free_htab_elem() after
htab_unlock_bucket() and adds a test case to verify the solution. Please
see the individual patches for details. Comments are always welcome.

Hou Tao (3):
  bpf: Call free_htab_elem() after htab_unlock_bucket()
  selftests/bpf: Move ENOTSUPP from bpf_util.h
  selftests/bpf: Test the update operations for htab of maps

 kernel/bpf/hashtab.c                          |  56 +++++---
 tools/testing/selftests/bpf/bpf_util.h        |   4 +
 .../selftests/bpf/prog_tests/bpf_tcp_ca.c     |   4 -
 .../selftests/bpf/prog_tests/lsm_cgroup.c     |   4 -
 .../selftests/bpf/prog_tests/map_in_map.c     | 132 +++++++++++++++++-
 .../selftests/bpf/prog_tests/sock_addr.c      |   4 -
 .../selftests/bpf/progs/update_map_in_htab.c  |  30 ++++
 tools/testing/selftests/bpf/test_maps.c       |   4 -
 tools/testing/selftests/bpf/test_verifier.c   |   4 -
 9 files changed, 204 insertions(+), 38 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/update_map_in_htab.c

Comments

Sebastian Andrzej Siewior Nov. 6, 2024, 8:45 a.m. UTC | #1
On 2024-11-06 14:35:39 [+0800], Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Hi,
Hi Hou,

> The patch set fixes a lockdep warning for htab of map. The
> warning is found when running test_maps. The warning occurs when
> htab_put_fd_value() attempts to acquire map_idr_lock to free the map id
> of the inner map while already holding the bucket lock (raw_spinlock_t).
> 
> The fix moves the invocation of free_htab_elem() after
> htab_unlock_bucket() and adds a test case to verify the solution. Please
> see the individual patches for details. Comments are always welcome.

Thank you.

Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

I've seen that you didn't move check_and_free_fields() out of the bucket
locked section. Type BPF_TIMER does hrtimer_cancel() if the timer
happens to run on a remote CPU. On PREEMPT_RT this will acquire a
sleeping lock which is problematic due to the raw_spinlock_t.
Would it be okay, to cleanup the timer unconditionally via the
workqueue?

Sebastian
Hou Tao Nov. 6, 2024, 9:48 a.m. UTC | #2
Hi,

On 11/6/2024 4:45 PM, Sebastian Andrzej Siewior wrote:
> On 2024-11-06 14:35:39 [+0800], Hou Tao wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> Hi,
> Hi Hou,
>
>> The patch set fixes a lockdep warning for htab of map. The
>> warning is found when running test_maps. The warning occurs when
>> htab_put_fd_value() attempts to acquire map_idr_lock to free the map id
>> of the inner map while already holding the bucket lock (raw_spinlock_t).
>>
>> The fix moves the invocation of free_htab_elem() after
>> htab_unlock_bucket() and adds a test case to verify the solution. Please
>> see the individual patches for details. Comments are always welcome.
> Thank you.
>
> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> I've seen that you didn't move check_and_free_fields() out of the bucket
> locked section. Type BPF_TIMER does hrtimer_cancel() if the timer
> happens to run on a remote CPU. On PREEMPT_RT this will acquire a
> sleeping lock which is problematic due to the raw_spinlock_t.
> Would it be okay, to cleanup the timer unconditionally via the
> workqueue?

Yes. The patch set still invokes check_and_free_fields() under the
bucket lock when updating an existing element in a pre-allocated htab. I
missed the hrtimer case. For the sleeping lock, you mean the
cpu_base->softirq_expiry_lock in hrtimer_cancel_waiting_running(), right
? Instead of cancelling the timer in workqueue, maybe we could save the
old value temporarily in the bucket lock, and try to free it outside of
the bucket lock or disabling the extra_elems logic temporarily for the
case ?
> Sebastian
Sebastian Andrzej Siewior Nov. 6, 2024, 10:05 a.m. UTC | #3
On 2024-11-06 17:48:59 [+0800], Hou Tao wrote:
> Hi,
Hi,

> Yes. The patch set still invokes check_and_free_fields() under the
> bucket lock when updating an existing element in a pre-allocated htab. I
> missed the hrtimer case. For the sleeping lock, you mean the
> cpu_base->softirq_expiry_lock in hrtimer_cancel_waiting_running(), right

Yes.

> ? Instead of cancelling the timer in workqueue, maybe we could save the
> old value temporarily in the bucket lock, and try to free it outside of
> the bucket lock or disabling the extra_elems logic temporarily for the
> case ?

Well, it is up to you. Either:

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 1a43d06eab286..b077af12fc9b4 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1593,10 +1593,7 @@ void bpf_timer_cancel_and_free(void *val)
 	 * To avoid these issues, punt to workqueue context when we are in a
 	 * timer callback.
 	 */
-	if (this_cpu_read(hrtimer_running))
-		queue_work(system_unbound_wq, &t->cb.delete_work);
-	else
-		bpf_timer_delete_work(&t->cb.delete_work);
+	queue_work(system_unbound_wq, &t->cb.delete_work);
 }
 
 /* This function is called by map_delete/update_elem for individual element and

Or something smarter where you cancel the timer outside of the bucket
lock.

Sebastian
patchwork-bot+netdevbpf@kernel.org Nov. 8, 2024, 7:50 p.m. UTC | #4
Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Wed,  6 Nov 2024 14:35:39 +0800 you wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Hi,
> 
> The patch set fixes a lockdep warning for htab of map. The
> warning is found when running test_maps. The warning occurs when
> htab_put_fd_value() attempts to acquire map_idr_lock to free the map id
> of the inner map while already holding the bucket lock (raw_spinlock_t).
> 
> [...]

Here is the summary with links:
  - [bpf-next,1/3] bpf: Call free_htab_elem() after htab_unlock_bucket()
    https://git.kernel.org/bpf/bpf-next/c/81eef03cbf70
  - [bpf-next,2/3] selftests/bpf: Move ENOTSUPP from bpf_util.h
    https://git.kernel.org/bpf/bpf-next/c/7d5e83b4d41b
  - [bpf-next,3/3] selftests/bpf: Test the update operations for htab of maps
    https://git.kernel.org/bpf/bpf-next/c/ff605ec69735

You are awesome, thank you!
Alexei Starovoitov Nov. 8, 2024, 7:52 p.m. UTC | #5
On Wed, Nov 6, 2024 at 1:49 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> Hi,
>
> On 11/6/2024 4:45 PM, Sebastian Andrzej Siewior wrote:
> > On 2024-11-06 14:35:39 [+0800], Hou Tao wrote:
> >> From: Hou Tao <houtao1@huawei.com>
> >>
> >> Hi,
> > Hi Hou,
> >
> >> The patch set fixes a lockdep warning for htab of map. The
> >> warning is found when running test_maps. The warning occurs when
> >> htab_put_fd_value() attempts to acquire map_idr_lock to free the map id
> >> of the inner map while already holding the bucket lock (raw_spinlock_t).
> >>
> >> The fix moves the invocation of free_htab_elem() after
> >> htab_unlock_bucket() and adds a test case to verify the solution. Please
> >> see the individual patches for details. Comments are always welcome.

The fix makes sense.
I manually resolved merge conflict and applied.

> > Thank you.
> >
> > Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> >
> > I've seen that you didn't move check_and_free_fields() out of the bucket
> > locked section. Type BPF_TIMER does hrtimer_cancel() if the timer
> > happens to run on a remote CPU. On PREEMPT_RT this will acquire a
> > sleeping lock which is problematic due to the raw_spinlock_t.
> > Would it be okay, to cleanup the timer unconditionally via the
> > workqueue?
>
> Yes. The patch set still invokes check_and_free_fields() under the
> bucket lock when updating an existing element in a pre-allocated htab. I
> missed the hrtimer case. For the sleeping lock, you mean the
> cpu_base->softirq_expiry_lock in hrtimer_cancel_waiting_running(), right
> ? Instead of cancelling the timer in workqueue, maybe we could save the
> old value temporarily in the bucket lock, and try to free it outside of
> the bucket lock or disabling the extra_elems logic temporarily for the
> case ?

We definitely need to avoid spamming wq when cancelling timers.
wq may not be able to handle the volume.
Moving it outside of bucket lock is certainly better.
Hou Tao Nov. 9, 2024, 1:48 a.m. UTC | #6
Hi,

On 11/9/2024 3:52 AM, Alexei Starovoitov wrote:
> On Wed, Nov 6, 2024 at 1:49 AM Hou Tao <houtao@huaweicloud.com> wrote:
>> Hi,
>>
>> On 11/6/2024 4:45 PM, Sebastian Andrzej Siewior wrote:
>>> On 2024-11-06 14:35:39 [+0800], Hou Tao wrote:
>>>> From: Hou Tao <houtao1@huawei.com>
>>>>
>>>> Hi,
>>> Hi Hou,
>>>
>>>> The patch set fixes a lockdep warning for htab of map. The
>>>> warning is found when running test_maps. The warning occurs when
>>>> htab_put_fd_value() attempts to acquire map_idr_lock to free the map id
>>>> of the inner map while already holding the bucket lock (raw_spinlock_t).
>>>>
>>>> The fix moves the invocation of free_htab_elem() after
>>>> htab_unlock_bucket() and adds a test case to verify the solution. Please
>>>> see the individual patches for details. Comments are always welcome.
> The fix makes sense.
> I manually resolved merge conflict and applied.

Thanks for the manually conflict resolving. However, the patch set
doesn't move all free operations outside of lock scope (e.g., for
bpf_map_lookup_and_delete()), because htab of maps doesn't support it. I
could post another patch set to do that.
>
>>> Thank you.
>>>
>>> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>>>
>>> I've seen that you didn't move check_and_free_fields() out of the bucket
>>> locked section. Type BPF_TIMER does hrtimer_cancel() if the timer
>>> happens to run on a remote CPU. On PREEMPT_RT this will acquire a
>>> sleeping lock which is problematic due to the raw_spinlock_t.
>>> Would it be okay, to cleanup the timer unconditionally via the
>>> workqueue?
>> Yes. The patch set still invokes check_and_free_fields() under the
>> bucket lock when updating an existing element in a pre-allocated htab. I
>> missed the hrtimer case. For the sleeping lock, you mean the
>> cpu_base->softirq_expiry_lock in hrtimer_cancel_waiting_running(), right
>> ? Instead of cancelling the timer in workqueue, maybe we could save the
>> old value temporarily in the bucket lock, and try to free it outside of
>> the bucket lock or disabling the extra_elems logic temporarily for the
>> case ?
> We definitely need to avoid spamming wq when cancelling timers.
> wq may not be able to handle the volume.
> Moving it outside of bucket lock is certainly better.
> .

OK. Will try whether there is a better way to handle the cancelling of
timer case.
Alexei Starovoitov Nov. 9, 2024, 1:55 a.m. UTC | #7
On Fri, Nov 8, 2024 at 5:48 PM Hou Tao <houtao@huaweicloud.com> wrote:
>
> Hi,
>
> On 11/9/2024 3:52 AM, Alexei Starovoitov wrote:
> > On Wed, Nov 6, 2024 at 1:49 AM Hou Tao <houtao@huaweicloud.com> wrote:
> >> Hi,
> >>
> >> On 11/6/2024 4:45 PM, Sebastian Andrzej Siewior wrote:
> >>> On 2024-11-06 14:35:39 [+0800], Hou Tao wrote:
> >>>> From: Hou Tao <houtao1@huawei.com>
> >>>>
> >>>> Hi,
> >>> Hi Hou,
> >>>
> >>>> The patch set fixes a lockdep warning for htab of map. The
> >>>> warning is found when running test_maps. The warning occurs when
> >>>> htab_put_fd_value() attempts to acquire map_idr_lock to free the map id
> >>>> of the inner map while already holding the bucket lock (raw_spinlock_t).
> >>>>
> >>>> The fix moves the invocation of free_htab_elem() after
> >>>> htab_unlock_bucket() and adds a test case to verify the solution. Please
> >>>> see the individual patches for details. Comments are always welcome.
> > The fix makes sense.
> > I manually resolved merge conflict and applied.
>
> Thanks for the manually conflict resolving. However, the patch set
> doesn't move all free operations outside of lock scope (e.g., for
> bpf_map_lookup_and_delete()), because htab of maps doesn't support it. I
> could post another patch set to do that.

Pls do. It's easier to review incrementally.
For the last year we've been working on "resilient spin lock" and
hope to post patches in a couple weeks.
In those patches we will be replacing htab bucket lock with this new lock,
so moving other locks out of htab lock region fits very well.