Message ID | 20230119040111.350923-3-longman@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF | expand |
Hi Waiman, Thanks for your effort on trying to fix this. On Wed, Jan 18, 2023 at 11:01:11PM -0500, Waiman Long wrote: > @@ -567,7 +574,9 @@ static void __remove_object(struct kmemleak_object *object) > rb_erase(&object->rb_node, object->flags & OBJECT_PHYS ? > &object_phys_tree_root : > &object_tree_root); > - list_del_rcu(&object->object_list); > + if (!(object->del_state & DELSTATE_NO_DELETE)) > + list_del_rcu(&object->object_list); > + object->del_state |= DELSTATE_REMOVED; > } So IIUC, this prevents the current object being scanned from being removed from the list during the kmemleak_cond_resched() call. > /* > @@ -633,6 +642,7 @@ static void __create_object(unsigned long ptr, size_t size, > object->count = 0; /* white color initially */ > object->jiffies = jiffies; > object->checksum = 0; > + object->del_state = 0; > > /* task information */ > if (in_hardirq()) { > @@ -1470,9 +1480,22 @@ static void kmemleak_cond_resched(struct kmemleak_object *object) > if (!get_object(object)) > return; /* Try next object */ > > + raw_spin_lock_irq(&kmemleak_lock); > + if (object->del_state & DELSTATE_REMOVED) > + goto unlock_put; /* Object removed */ > + object->del_state |= DELSTATE_NO_DELETE; > + raw_spin_unlock_irq(&kmemleak_lock); > + > rcu_read_unlock(); > cond_resched(); > rcu_read_lock(); > + > + raw_spin_lock_irq(&kmemleak_lock); > + if (object->del_state & DELSTATE_REMOVED) > + list_del_rcu(&object->object_list); > + object->del_state &= ~DELSTATE_NO_DELETE; > +unlock_put: > + raw_spin_unlock_irq(&kmemleak_lock); > put_object(object); > } I'm not sure this was the only problem. We do have the problem that the current object may be removed from the list, solved above, but another scenario I had in mind is the next object being released during this brief resched period. The RCU relies on object->next->next being valid but, with a brief rcu_read_unlock(), the object->next could be freed, reallocated, so object->next->next invalid.
On 1/20/23 14:18, Catalin Marinas wrote: > Hi Waiman, > > Thanks for your effort on trying to fix this. > > On Wed, Jan 18, 2023 at 11:01:11PM -0500, Waiman Long wrote: >> @@ -567,7 +574,9 @@ static void __remove_object(struct kmemleak_object *object) >> rb_erase(&object->rb_node, object->flags & OBJECT_PHYS ? >> &object_phys_tree_root : >> &object_tree_root); >> - list_del_rcu(&object->object_list); >> + if (!(object->del_state & DELSTATE_NO_DELETE)) >> + list_del_rcu(&object->object_list); >> + object->del_state |= DELSTATE_REMOVED; >> } > So IIUC, this prevents the current object being scanned from being > removed from the list during the kmemleak_cond_resched() call. Yes, that is the point. > >> /* >> @@ -633,6 +642,7 @@ static void __create_object(unsigned long ptr, size_t size, >> object->count = 0; /* white color initially */ >> object->jiffies = jiffies; >> object->checksum = 0; >> + object->del_state = 0; >> >> /* task information */ >> if (in_hardirq()) { >> @@ -1470,9 +1480,22 @@ static void kmemleak_cond_resched(struct kmemleak_object *object) >> if (!get_object(object)) >> return; /* Try next object */ >> >> + raw_spin_lock_irq(&kmemleak_lock); >> + if (object->del_state & DELSTATE_REMOVED) >> + goto unlock_put; /* Object removed */ >> + object->del_state |= DELSTATE_NO_DELETE; >> + raw_spin_unlock_irq(&kmemleak_lock); >> + >> rcu_read_unlock(); >> cond_resched(); >> rcu_read_lock(); >> + >> + raw_spin_lock_irq(&kmemleak_lock); >> + if (object->del_state & DELSTATE_REMOVED) >> + list_del_rcu(&object->object_list); >> + object->del_state &= ~DELSTATE_NO_DELETE; >> +unlock_put: >> + raw_spin_unlock_irq(&kmemleak_lock); >> put_object(object); >> } > I'm not sure this was the only problem. We do have the problem that the > current object may be removed from the list, solved above, but another > scenario I had in mind is the next object being released during this > brief resched period. The RCU relies on object->next->next being valid > but, with a brief rcu_read_unlock(), the object->next could be freed, > reallocated, so object->next->next invalid. Looking at the following scenario, object->next => A (removed) A->next => B (removed) As object->next is pointing to A, A must still be allocated and not freed yet. Now if B is also removed, there are 2 possible case. 1) B is removed from the list after the removal of A. In that case, it is not possible that A is allocated, but B is freed. 2) B is removed before A. A->next can't pointed to B when it is being removed. Due to weak memory ordering, it is possible that another cpu can see A->next still pointing to B. In that case, I believe that it is still within the grace period where neither A or B is freed. In fact, it is no different from a regular scanning of the object list without ever called cond_resched(). Cheers, Longman
On Fri, Jan 20, 2023 at 05:54:28PM -0500, Waiman Long wrote: > On 1/20/23 14:18, Catalin Marinas wrote: > > > /* > > > @@ -633,6 +642,7 @@ static void __create_object(unsigned long ptr, size_t size, > > > object->count = 0; /* white color initially */ > > > object->jiffies = jiffies; > > > object->checksum = 0; > > > + object->del_state = 0; > > > /* task information */ > > > if (in_hardirq()) { > > > @@ -1470,9 +1480,22 @@ static void kmemleak_cond_resched(struct kmemleak_object *object) > > > if (!get_object(object)) > > > return; /* Try next object */ > > > + raw_spin_lock_irq(&kmemleak_lock); > > > + if (object->del_state & DELSTATE_REMOVED) > > > + goto unlock_put; /* Object removed */ > > > + object->del_state |= DELSTATE_NO_DELETE; > > > + raw_spin_unlock_irq(&kmemleak_lock); > > > + > > > rcu_read_unlock(); > > > cond_resched(); > > > rcu_read_lock(); > > > + > > > + raw_spin_lock_irq(&kmemleak_lock); > > > + if (object->del_state & DELSTATE_REMOVED) > > > + list_del_rcu(&object->object_list); > > > + object->del_state &= ~DELSTATE_NO_DELETE; > > > +unlock_put: > > > + raw_spin_unlock_irq(&kmemleak_lock); > > > put_object(object); > > > } > > I'm not sure this was the only problem. We do have the problem that the > > current object may be removed from the list, solved above, but another > > scenario I had in mind is the next object being released during this > > brief resched period. The RCU relies on object->next->next being valid > > but, with a brief rcu_read_unlock(), the object->next could be freed, > > reallocated, so object->next->next invalid. > > Looking at the following scenario, > > object->next => A (removed) > A->next => B (removed) > > As object->next is pointing to A, A must still be allocated and not freed > yet. Now if B is also removed, there are 2 possible case. > > 1) B is removed from the list after the removal of A. In that case, it is > not possible that A is allocated, but B is freed. > > 2) B is removed before A. A->next can't pointed to B when it is being > removed. Due to weak memory ordering, it is possible that another cpu can > see A->next still pointing to B. In that case, I believe that it is still > within the grace period where neither A or B is freed. > > In fact, it is no different from a regular scanning of the object list > without ever called cond_resched(). More like thinking out loud: The lockless RCU loop relies on object->next->next being valid within the grace period (A not freed). Due to weak memory ordering, the looping CPU may not observe the object->next update (removal of A) by another CPU, so it continues to loop over it. But since we do an rcu_read_unlock() in the middle of the loop, I don't think these assumptions are still valid, so A may be freed. What we need is that object->next reading for the following iteration either sees the updated object->next (B) or it sees A but the latter still around. I think this holds with the proposed kmemleak_cond_resched() since we now start a new grace period with rcu_read_lock() followed by taking and releasing kmemleak_lock. The latter would give us the memory ordering required since removing object A from the list does take the lock. So yeah, you are probably right, I just find it hard to get my head around ;). I still think it would be simpler with a single kmemleak_lock (no object->lock) but that's more involved than a simple fix. Assuming your (and my) reasoning above is correct: Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
On 1/23/23 14:24, Catalin Marinas wrote: > On Fri, Jan 20, 2023 at 05:54:28PM -0500, Waiman Long wrote: >> On 1/20/23 14:18, Catalin Marinas wrote: >>>> /* >>>> @@ -633,6 +642,7 @@ static void __create_object(unsigned long ptr, size_t size, >>>> object->count = 0; /* white color initially */ >>>> object->jiffies = jiffies; >>>> object->checksum = 0; >>>> + object->del_state = 0; >>>> /* task information */ >>>> if (in_hardirq()) { >>>> @@ -1470,9 +1480,22 @@ static void kmemleak_cond_resched(struct kmemleak_object *object) >>>> if (!get_object(object)) >>>> return; /* Try next object */ >>>> + raw_spin_lock_irq(&kmemleak_lock); >>>> + if (object->del_state & DELSTATE_REMOVED) >>>> + goto unlock_put; /* Object removed */ >>>> + object->del_state |= DELSTATE_NO_DELETE; >>>> + raw_spin_unlock_irq(&kmemleak_lock); >>>> + >>>> rcu_read_unlock(); >>>> cond_resched(); >>>> rcu_read_lock(); >>>> + >>>> + raw_spin_lock_irq(&kmemleak_lock); >>>> + if (object->del_state & DELSTATE_REMOVED) >>>> + list_del_rcu(&object->object_list); >>>> + object->del_state &= ~DELSTATE_NO_DELETE; >>>> +unlock_put: >>>> + raw_spin_unlock_irq(&kmemleak_lock); >>>> put_object(object); >>>> } >>> I'm not sure this was the only problem. We do have the problem that the >>> current object may be removed from the list, solved above, but another >>> scenario I had in mind is the next object being released during this >>> brief resched period. The RCU relies on object->next->next being valid >>> but, with a brief rcu_read_unlock(), the object->next could be freed, >>> reallocated, so object->next->next invalid. >> Looking at the following scenario, >> >> object->next => A (removed) >> A->next => B (removed) >> >> As object->next is pointing to A, A must still be allocated and not freed >> yet. Now if B is also removed, there are 2 possible case. >> >> 1) B is removed from the list after the removal of A. In that case, it is >> not possible that A is allocated, but B is freed. >> >> 2) B is removed before A. A->next can't pointed to B when it is being >> removed. Due to weak memory ordering, it is possible that another cpu can >> see A->next still pointing to B. In that case, I believe that it is still >> within the grace period where neither A or B is freed. >> >> In fact, it is no different from a regular scanning of the object list >> without ever called cond_resched(). > More like thinking out loud: > > The lockless RCU loop relies on object->next->next being valid within > the grace period (A not freed). Due to weak memory ordering, the looping > CPU may not observe the object->next update (removal of A) by another > CPU, so it continues to loop over it. But since we do an > rcu_read_unlock() in the middle of the loop, I don't think these > assumptions are still valid, so A may be freed. > > What we need is that object->next reading for the following iteration > either sees the updated object->next (B) or it sees A but the latter > still around. I think this holds with the proposed > kmemleak_cond_resched() since we now start a new grace period with > rcu_read_lock() followed by taking and releasing kmemleak_lock. The > latter would give us the memory ordering required since removing object > A from the list does take the lock. > > So yeah, you are probably right, I just find it hard to get my head > around ;). I still think it would be simpler with a single kmemleak_lock > (no object->lock) but that's more involved than a simple fix. > > Assuming your (and my) reasoning above is correct: > > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> I should have mentioned the fact that taking the kmemleak_lock will post some ordering guarantee since it is done after a new rcu_read_lock(). So yes, even if both A and B are removed from the object_list, they should still be around and not freed yet. Thanks for your review. Cheers, Longman
diff --git a/mm/kmemleak.c b/mm/kmemleak.c index e7cb521236bf..0ece170fc9ef 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -13,11 +13,12 @@ * * The following locks and mutexes are used by kmemleak: * - * - kmemleak_lock (raw_spinlock_t): protects the object_list modifications and - * accesses to the object_tree_root (or object_phys_tree_root). The - * object_list is the main list holding the metadata (struct kmemleak_object) - * for the allocated memory blocks. The object_tree_root and object_phys_tree_root - * are red black trees used to look-up metadata based on a pointer to the + * - kmemleak_lock (raw_spinlock_t): protects the object_list as well as + * del_state modifications and accesses to the object_tree_root (or + * object_phys_tree_root). The object_list is the main list holding the + * metadata (struct kmemleak_object) for the allocated memory blocks. + * The object_tree_root and object_phys_tree_root are red + * black trees used to look-up metadata based on a pointer to the * corresponding memory block. The object_phys_tree_root is for objects * allocated with physical address. The kmemleak_object structures are * added to the object_list and object_tree_root (or object_phys_tree_root) @@ -147,6 +148,7 @@ struct kmemleak_object { struct rcu_head rcu; /* object_list lockless traversal */ /* object usage count; object freed when use_count == 0 */ atomic_t use_count; + unsigned int del_state; /* deletion state */ unsigned long pointer; size_t size; /* pass surplus references to this pointer */ @@ -177,6 +179,11 @@ struct kmemleak_object { /* flag set for object allocated with physical address */ #define OBJECT_PHYS (1 << 4) +/* set when __remove_object() called */ +#define DELSTATE_REMOVED (1 << 0) +/* set to temporarily prevent deletion from object_list */ +#define DELSTATE_NO_DELETE (1 << 1) + #define HEX_PREFIX " " /* number of bytes to print per line; must be 16 or 32 */ #define HEX_ROW_SIZE 16 @@ -567,7 +574,9 @@ static void __remove_object(struct kmemleak_object *object) rb_erase(&object->rb_node, object->flags & OBJECT_PHYS ? &object_phys_tree_root : &object_tree_root); - list_del_rcu(&object->object_list); + if (!(object->del_state & DELSTATE_NO_DELETE)) + list_del_rcu(&object->object_list); + object->del_state |= DELSTATE_REMOVED; } /* @@ -633,6 +642,7 @@ static void __create_object(unsigned long ptr, size_t size, object->count = 0; /* white color initially */ object->jiffies = jiffies; object->checksum = 0; + object->del_state = 0; /* task information */ if (in_hardirq()) { @@ -1470,9 +1480,22 @@ static void kmemleak_cond_resched(struct kmemleak_object *object) if (!get_object(object)) return; /* Try next object */ + raw_spin_lock_irq(&kmemleak_lock); + if (object->del_state & DELSTATE_REMOVED) + goto unlock_put; /* Object removed */ + object->del_state |= DELSTATE_NO_DELETE; + raw_spin_unlock_irq(&kmemleak_lock); + rcu_read_unlock(); cond_resched(); rcu_read_lock(); + + raw_spin_lock_irq(&kmemleak_lock); + if (object->del_state & DELSTATE_REMOVED) + list_del_rcu(&object->object_list); + object->del_state &= ~DELSTATE_NO_DELETE; +unlock_put: + raw_spin_unlock_irq(&kmemleak_lock); put_object(object); }
Commit 6edda04ccc7c ("mm/kmemleak: prevent soft lockup in first object iteration loop of kmemleak_scan()") fixes soft lockup problem in kmemleak_scan() by periodically doing a cond_resched(). It does take a reference of the current object before doing it. Unfortunately, if the object has been deleted from the object_list, the next object pointed to by its next pointer may no longer be valid after coming back from cond_resched(). This can result in use-after-free and other nasty problem. Fix this problem by adding a del_state flag into kmemleak_object structure to synchronize the object deletion process between kmemleak_cond_resched() and __remove_object() to make sure that the object remained in the object_list in the duration of the cond_resched() call. Fixes: 6edda04ccc7c ("mm/kmemleak: prevent soft lockup in first object iteration loop of kmemleak_scan()") Signed-off-by: Waiman Long <longman@redhat.com> --- mm/kmemleak.c | 35 +++++++++++++++++++++++++++++------ 1 file changed, 29 insertions(+), 6 deletions(-)