diff mbox series

mm/kmemleak: Don't hold kmemleak_lock when calling printk()

Message ID 20240228191444.481048-1-longman@redhat.com (mailing list archive)
State New
Headers show
Series mm/kmemleak: Don't hold kmemleak_lock when calling printk() | expand

Commit Message

Waiman Long Feb. 28, 2024, 7:14 p.m. UTC
When some error conditions happen (like OOM), some kmemleak functions
call printk() to dump out some useful debugging information while holding
the kmemleak_lock. This may cause deadlock as the printk() function
may need to allocate additional memory leading to a create_object()
call acquiring kmemleak_lock again.

Fix this deadlock issue by making sure that printk() is only called
after releasing the kmemleak_lock.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/kmemleak.c | 64 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 46 insertions(+), 18 deletions(-)

Comments

Catalin Marinas Feb. 29, 2024, 3:25 p.m. UTC | #1
On Wed, Feb 28, 2024 at 02:14:44PM -0500, Waiman Long wrote:
> When some error conditions happen (like OOM), some kmemleak functions
> call printk() to dump out some useful debugging information while holding
> the kmemleak_lock. This may cause deadlock as the printk() function
> may need to allocate additional memory leading to a create_object()
> call acquiring kmemleak_lock again.
> 
> Fix this deadlock issue by making sure that printk() is only called
> after releasing the kmemleak_lock.

I can't say I'm familiar with the printk() code but I always thought it
uses some ring buffers as it can be called from all kind of contexts and
allocation is not guaranteed.

If printk() ends up taking kmemleak_lock through the slab allocator, I
wonder whether we have bigger problems. The lock order is always
kmemleak_lock -> object->lock but if printk() triggers a callback into
kmemleak, we can also get object->lock -> kmemleak_lock ordering, so
another potential deadlock.
Waiman Long Feb. 29, 2024, 3:55 p.m. UTC | #2
On 2/29/24 10:25, Catalin Marinas wrote:
> On Wed, Feb 28, 2024 at 02:14:44PM -0500, Waiman Long wrote:
>> When some error conditions happen (like OOM), some kmemleak functions
>> call printk() to dump out some useful debugging information while holding
>> the kmemleak_lock. This may cause deadlock as the printk() function
>> may need to allocate additional memory leading to a create_object()
>> call acquiring kmemleak_lock again.
>>
>> Fix this deadlock issue by making sure that printk() is only called
>> after releasing the kmemleak_lock.
> I can't say I'm familiar with the printk() code but I always thought it
> uses some ring buffers as it can be called from all kind of contexts and
> allocation is not guaranteed.
>
> If printk() ends up taking kmemleak_lock through the slab allocator, I
> wonder whether we have bigger problems. The lock order is always
> kmemleak_lock -> object->lock but if printk() triggers a callback into
> kmemleak, we can also get object->lock -> kmemleak_lock ordering, so
> another potential deadlock.

object->lock is per object whereas kmemleak_lock is global. When taking 
object->lock and doing a data dump leading to a call that takes the 
kmemlock, it is highly unlikely the it will need to take that particular 
object->lock again. I do agree that lockdep may still warn about it if 
that happens as all the object->lock's are likely to be treated to be in 
the same class.

I should probably clarify in the change log that the lockdep splat is 
actually,

[ 3991.452558] Chain exists of: [ 3991.452559] console_owner -> 
&port->lock --> kmemleak_lock

So if kmemleak calls printk() acquiring either console_owner or 
port->lock. It may cause deadlock.

Cheers, Longman
Catalin Marinas March 1, 2024, 2:49 p.m. UTC | #3
On Thu, Feb 29, 2024 at 10:55:38AM -0500, Waiman Long wrote:
> On 2/29/24 10:25, Catalin Marinas wrote:
> > On Wed, Feb 28, 2024 at 02:14:44PM -0500, Waiman Long wrote:
> > > When some error conditions happen (like OOM), some kmemleak functions
> > > call printk() to dump out some useful debugging information while holding
> > > the kmemleak_lock. This may cause deadlock as the printk() function
> > > may need to allocate additional memory leading to a create_object()
> > > call acquiring kmemleak_lock again.
> > > 
> > > Fix this deadlock issue by making sure that printk() is only called
> > > after releasing the kmemleak_lock.
> > I can't say I'm familiar with the printk() code but I always thought it
> > uses some ring buffers as it can be called from all kind of contexts and
> > allocation is not guaranteed.
> > 
> > If printk() ends up taking kmemleak_lock through the slab allocator, I
> > wonder whether we have bigger problems. The lock order is always
> > kmemleak_lock -> object->lock but if printk() triggers a callback into
> > kmemleak, we can also get object->lock -> kmemleak_lock ordering, so
> > another potential deadlock.
> 
> object->lock is per object whereas kmemleak_lock is global. When taking
> object->lock and doing a data dump leading to a call that takes the
> kmemlock, it is highly unlikely the it will need to take that particular
> object->lock again. I do agree that lockdep may still warn about it if that
> happens as all the object->lock's are likely to be treated to be in the same
> class.

Yeah, it's unlikely. I think it can only happen if there's a bug in
kmemleak (or slab) and the insertion fails because of the same object we
try to dump. But I suspect lockdep will complain either way.

> I should probably clarify in the change log that the lockdep splat is
> actually,
> 
> [ 3991.452558] Chain exists of: [ 3991.452559] console_owner -> &port->lock
> --> kmemleak_lock
> 
> So if kmemleak calls printk() acquiring either console_owner or port->lock.
> It may cause deadlock.

Could you please share the whole lockdep warning? IIUC, it's not the
printk() code allocating memory but somewhere down the line in the tty
layer.

Anyway, I had a look again at the kmemleak locking (I've been meaning to
simplify it for some time, drop the object->lock altogether). The only
time we nest object->lock within kmemleak_lock is during scan_block().
If we are unlucky to get some error on another CPU and dump that exact
object with printk(), it could lead to deadlock.

There's the dump_str_object_info() case as well triggered by a sysfs
write but luckily this takes the scan_mutex (same as during
scan_block()), so it solves the nesting problem.

I think in those error cases we can even ignore the object->lock when
dumping the info. Yeah, it can race, maybe not showing exactly the
precise data in some rare cases, but in those OOM scenarios it's
probably the least of our problem.
Waiman Long March 5, 2024, 3:20 a.m. UTC | #4
On 3/1/24 09:49, Catalin Marinas wrote:
> On Thu, Feb 29, 2024 at 10:55:38AM -0500, Waiman Long wrote:
>> On 2/29/24 10:25, Catalin Marinas wrote:
>>> On Wed, Feb 28, 2024 at 02:14:44PM -0500, Waiman Long wrote:
>>>> When some error conditions happen (like OOM), some kmemleak functions
>>>> call printk() to dump out some useful debugging information while holding
>>>> the kmemleak_lock. This may cause deadlock as the printk() function
>>>> may need to allocate additional memory leading to a create_object()
>>>> call acquiring kmemleak_lock again.
>>>>
>>>> Fix this deadlock issue by making sure that printk() is only called
>>>> after releasing the kmemleak_lock.
>>> I can't say I'm familiar with the printk() code but I always thought it
>>> uses some ring buffers as it can be called from all kind of contexts and
>>> allocation is not guaranteed.
>>>
>>> If printk() ends up taking kmemleak_lock through the slab allocator, I
>>> wonder whether we have bigger problems. The lock order is always
>>> kmemleak_lock -> object->lock but if printk() triggers a callback into
>>> kmemleak, we can also get object->lock -> kmemleak_lock ordering, so
>>> another potential deadlock.
>> object->lock is per object whereas kmemleak_lock is global. When taking
>> object->lock and doing a data dump leading to a call that takes the
>> kmemlock, it is highly unlikely the it will need to take that particular
>> object->lock again. I do agree that lockdep may still warn about it if that
>> happens as all the object->lock's are likely to be treated to be in the same
>> class.
> Yeah, it's unlikely. I think it can only happen if there's a bug in
> kmemleak (or slab) and the insertion fails because of the same object we
> try to dump. But I suspect lockdep will complain either way.
>
>> I should probably clarify in the change log that the lockdep splat is
>> actually,
>>
>> [ 3991.452558] Chain exists of: [ 3991.452559] console_owner -> &port->lock
>> --> kmemleak_lock
>>
>> So if kmemleak calls printk() acquiring either console_owner or port->lock.
>> It may cause deadlock.
> Could you please share the whole lockdep warning? IIUC, it's not the
> printk() code allocating memory but somewhere down the line in the tty
> layer.
Yes, I will do that in the next version.
>
> Anyway, I had a look again at the kmemleak locking (I've been meaning to
> simplify it for some time, drop the object->lock altogether). The only
> time we nest object->lock within kmemleak_lock is during scan_block().
> If we are unlucky to get some error on another CPU and dump that exact
> object with printk(), it could lead to deadlock.
>
> There's the dump_str_object_info() case as well triggered by a sysfs
> write but luckily this takes the scan_mutex (same as during
> scan_block()), so it solves the nesting problem.
>
> I think in those error cases we can even ignore the object->lock when
> dumping the info. Yeah, it can race, maybe not showing exactly the
> precise data in some rare cases, but in those OOM scenarios it's
> probably the least of our problem.

I was thinking about not taking the object->lock too. You are right that 
under OOM, a little bit of racing doesn't really matter. Will do that in 
the next version.

Cheers,
Longman
diff mbox series

Patch

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 6a540c2b27c5..acd8742c80b5 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -401,6 +401,19 @@  static struct rb_root *object_tree(unsigned long objflags)
 	return &object_tree_root;
 }
 
+/*
+ * Increment the object use_count. Return 1 if successful or 0 otherwise. Note
+ * that once an object's use_count reached 0, the RCU freeing was already
+ * registered and the object should no longer be used. This function must be
+ * called under the protection of rcu_read_lock().
+ */
+static int get_object(struct kmemleak_object *object)
+{
+	return atomic_inc_not_zero(&object->use_count);
+}
+
+static void put_object(struct kmemleak_object *object);
+
 /*
  * Look-up a memory block metadata (kmemleak_object) in the object search
  * tree based on a pointer value. If alias is 0, only values pointing to the
@@ -413,6 +426,8 @@  static struct kmemleak_object *__lookup_object(unsigned long ptr, int alias,
 	struct rb_node *rb = object_tree(objflags)->rb_node;
 	unsigned long untagged_ptr = (unsigned long)kasan_reset_tag((void *)ptr);
 
+	lockdep_assert_held(&kmemleak_lock);
+
 	while (rb) {
 		struct kmemleak_object *object;
 		unsigned long untagged_objp;
@@ -427,9 +442,20 @@  static struct kmemleak_object *__lookup_object(unsigned long ptr, int alias,
 		else if (untagged_objp == untagged_ptr || alias)
 			return object;
 		else {
+			if (!get_object(object))
+				break;
+			/*
+			 * Release kmemleak_lock and acquire object->lock
+			 * temporarily to avoid deadlock in printk().
+			 */
+			raw_spin_unlock(&kmemleak_lock);
 			kmemleak_warn("Found object by alias at 0x%08lx\n",
 				      ptr);
+			raw_spin_lock(&object->lock);
 			dump_object_info(object);
+			raw_spin_unlock(&object->lock);
+			put_object(object);
+			raw_spin_lock(&kmemleak_lock);
 			break;
 		}
 	}
@@ -442,22 +468,12 @@  static struct kmemleak_object *lookup_object(unsigned long ptr, int alias)
 	return __lookup_object(ptr, alias, 0);
 }
 
-/*
- * Increment the object use_count. Return 1 if successful or 0 otherwise. Note
- * that once an object's use_count reached 0, the RCU freeing was already
- * registered and the object should no longer be used. This function must be
- * called under the protection of rcu_read_lock().
- */
-static int get_object(struct kmemleak_object *object)
-{
-	return atomic_inc_not_zero(&object->use_count);
-}
-
 /*
  * Memory pool allocation and freeing. kmemleak_lock must not be held.
  */
 static struct kmemleak_object *mem_pool_alloc(gfp_t gfp)
 {
+	bool warn = false;
 	unsigned long flags;
 	struct kmemleak_object *object;
 
@@ -477,9 +493,11 @@  static struct kmemleak_object *mem_pool_alloc(gfp_t gfp)
 	else if (mem_pool_free_count)
 		object = &mem_pool[--mem_pool_free_count];
 	else
-		pr_warn_once("Memory pool empty, consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE\n");
+		warn = true;
 	raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
 
+	if (unlikely(warn))
+		pr_warn_once("Memory pool empty, consider increasing CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE\n");
 	return object;
 }
 
@@ -692,6 +710,8 @@  static int __link_object(struct kmemleak_object *object, unsigned long ptr,
 	unsigned long untagged_ptr;
 	unsigned long untagged_objp;
 
+	lockdep_assert_held(&kmemleak_lock);
+
 	object->flags = OBJECT_ALLOCATED | objflags;
 	object->pointer = ptr;
 	object->size = kfence_ksize((void *)ptr) ?: size;
@@ -718,13 +738,21 @@  static int __link_object(struct kmemleak_object *object, unsigned long ptr,
 		else if (untagged_objp + parent->size <= untagged_ptr)
 			link = &parent->rb_node.rb_right;
 		else {
-			kmemleak_stop("Cannot insert 0x%lx into the object search tree (overlaps existing)\n",
-				      ptr);
+			if (!get_object(parent))
+				return -EEXIST;
 			/*
-			 * No need for parent->lock here since "parent" cannot
-			 * be freed while the kmemleak_lock is held.
+			 * Release kmemleak_lock & acquire parent->lock
+			 * temporarily to avoid deadlock in printk().
 			 */
+			raw_spin_unlock(&kmemleak_lock);
+
+			kmemleak_stop("Cannot insert 0x%lx into the object search tree (overlaps existing)\n",
+				      ptr);
+			raw_spin_lock(&parent->lock);
 			dump_object_info(parent);
+			raw_spin_unlock(&parent->lock);
+			put_object(parent);
+			raw_spin_lock(&kmemleak_lock);
 			return -EEXIST;
 		}
 	}
@@ -839,11 +867,12 @@  static void delete_object_part(unsigned long ptr, size_t size,
 	raw_spin_lock_irqsave(&kmemleak_lock, flags);
 	object = __find_and_remove_object(ptr, 1, objflags);
 	if (!object) {
+		raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
 #ifdef DEBUG
 		kmemleak_warn("Partially freeing unknown object at 0x%08lx (size %zu)\n",
 			      ptr, size);
 #endif
-		goto unlock;
+		goto out;
 	}
 
 	/*
@@ -862,7 +891,6 @@  static void delete_object_part(unsigned long ptr, size_t size,
 			   object->min_count, objflags))
 		object_r = NULL;
 
-unlock:
 	raw_spin_unlock_irqrestore(&kmemleak_lock, flags);
 	if (object)
 		__delete_object(object);