Message ID | 20230523140002.575854344@linutronix.de (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/vmalloc: Assorted fixes and improvements | expand |
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
On Tue, May 23 2023 at 17:17, Christoph Hellwig wrote: > Looks good: > > Reviewed-by: Christoph Hellwig <hch@lst.de> Thanks! This probably wants a Fixes tag and if I'm not completely off then this problem exists since 2.6.28: db64fe02258f ("mm: rewrite vmap layer")
On Tue, May 23, 2023 at 06:40:40PM +0200, Thomas Gleixner wrote: > On Tue, May 23 2023 at 17:17, Christoph Hellwig wrote: > > Looks good: > > > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > Thanks! > > This probably wants a Fixes tag and if I'm not completely off then this > problem exists since 2.6.28: db64fe02258f ("mm: rewrite vmap layer") > Yes. This code is quite old and the commit, you pointed looks correct. -- Uladzislau Rezki
On Tue, May 23, 2023 at 04:02:11PM +0200, Thomas Gleixner wrote: > _vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a > page are left in the system. This is required due to the lazy TLB flush > mechanism in vmalloc. > > This is tried to achieve by walking the per CPU free lists, but those do > not contain fully utilized vmap blocks because they are removed from the > free list once the blocks free space became zero. Oh Lord. > > So the per CPU list iteration does not find the block and if the page was > mapped via such a block and the TLB has not yet been flushed, the guarantee > of _vm_unmap_aliases() that there are no stale TLBs after returning is > broken: > > x = vb_alloc() // Removes vmap_block from free list because vb->free became 0 > vb_free(x) // Unmaps page and marks in dirty_min/max range > > // Page is reused > vm_unmap_aliases() // Can't find vmap block with the dirty space -> FAIL > > So instead of walking the per CPU free lists, walk the per CPU xarrays > which hold pointers to _all_ active blocks in the system including those > removed from the free lists. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > --- > mm/vmalloc.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2236,9 +2236,10 @@ static void _vm_unmap_aliases(unsigned l > for_each_possible_cpu(cpu) { > struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu); > struct vmap_block *vb; > + unsigned long idx; > > rcu_read_lock(); > - list_for_each_entry_rcu(vb, &vbq->free, free_list) { > + xa_for_each(&vbq->vmap_blocks, idx, vb) { > spin_lock(&vb->lock); > if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) { > unsigned long va_start = vb->va->va_start; > > LGTM. Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
On Tue, May 23, 2023 at 08:18:10PM +0100, Lorenzo Stoakes wrote: > On Tue, May 23, 2023 at 04:02:11PM +0200, Thomas Gleixner wrote: > > _vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a > > page are left in the system. This is required due to the lazy TLB flush > > mechanism in vmalloc. > > > > This is tried to achieve by walking the per CPU free lists, but those do > > not contain fully utilized vmap blocks because they are removed from the > > free list once the blocks free space became zero. > > Oh Lord. > > > > > So the per CPU list iteration does not find the block and if the page was > > mapped via such a block and the TLB has not yet been flushed, the guarantee > > of _vm_unmap_aliases() that there are no stale TLBs after returning is > > broken: > > > > x = vb_alloc() // Removes vmap_block from free list because vb->free became 0 > > vb_free(x) // Unmaps page and marks in dirty_min/max range > > > > // Page is reused > > vm_unmap_aliases() // Can't find vmap block with the dirty space -> FAIL > > > > So instead of walking the per CPU free lists, walk the per CPU xarrays > > which hold pointers to _all_ active blocks in the system including those > > removed from the free lists. > > > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > > --- > > mm/vmalloc.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -2236,9 +2236,10 @@ static void _vm_unmap_aliases(unsigned l > > for_each_possible_cpu(cpu) { > > struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu); > > struct vmap_block *vb; > > + unsigned long idx; > > > > rcu_read_lock(); > > - list_for_each_entry_rcu(vb, &vbq->free, free_list) { > > + xa_for_each(&vbq->vmap_blocks, idx, vb) { > > spin_lock(&vb->lock); > > if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) { > > unsigned long va_start = vb->va->va_start; > > > > > > LGTM. > > Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
On 05/23/23 at 04:02pm, Thomas Gleixner wrote: > _vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a > page are left in the system. This is required due to the lazy TLB flush > mechanism in vmalloc. > > This is tried to achieve by walking the per CPU free lists, but those do > not contain fully utilized vmap blocks because they are removed from the > free list once the blocks free space became zero. The problem description is not accurate. This is tried to achieve for va associated with vmap_block by walking the per CPU free lists, those fully utilized vmap blocks can still be flushed in __purge_vmap_area_lazy() by calculating the [min:max] of purge_vmap_area_list, because va of vmap_blocks will be added to purge_vmap_area_list too via vb_free(). If we finally exclude the vmap_block va from purge list in __purge_vmap_area_lazy(), then we can say that stale TLBs is missed to flush. No? IMHO, this is a preparation work for the vmap_block va excluding from purge list flushing. Please correct me if I am wrong. > > So the per CPU list iteration does not find the block and if the page was > mapped via such a block and the TLB has not yet been flushed, the guarantee > of _vm_unmap_aliases() that there are no stale TLBs after returning is > broken: > > x = vb_alloc() // Removes vmap_block from free list because vb->free became 0 > vb_free(x) // Unmaps page and marks in dirty_min/max range > > // Page is reused > vm_unmap_aliases() // Can't find vmap block with the dirty space -> FAIL > > So instead of walking the per CPU free lists, walk the per CPU xarrays > which hold pointers to _all_ active blocks in the system including those > removed from the free lists. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > --- > mm/vmalloc.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2236,9 +2236,10 @@ static void _vm_unmap_aliases(unsigned l > for_each_possible_cpu(cpu) { > struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu); > struct vmap_block *vb; > + unsigned long idx; > > rcu_read_lock(); > - list_for_each_entry_rcu(vb, &vbq->free, free_list) { > + xa_for_each(&vbq->vmap_blocks, idx, vb) { > spin_lock(&vb->lock); > if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) { > unsigned long va_start = vb->va->va_start; >
On 05/23/23 at 04:02pm, Thomas Gleixner wrote: > _vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a > page are left in the system. This is required due to the lazy TLB flush > mechanism in vmalloc. > > This is tried to achieve by walking the per CPU free lists, but those do > not contain fully utilized vmap blocks because they are removed from the > free list once the blocks free space became zero. > > So the per CPU list iteration does not find the block and if the page was > mapped via such a block and the TLB has not yet been flushed, the guarantee > of _vm_unmap_aliases() that there are no stale TLBs after returning is > broken: > > x = vb_alloc() // Removes vmap_block from free list because vb->free became 0 > vb_free(x) // Unmaps page and marks in dirty_min/max range > > // Page is reused > vm_unmap_aliases() // Can't find vmap block with the dirty space -> FAIL > > So instead of walking the per CPU free lists, walk the per CPU xarrays > which hold pointers to _all_ active blocks in the system including those > removed from the free lists. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > --- > mm/vmalloc.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2236,9 +2236,10 @@ static void _vm_unmap_aliases(unsigned l > for_each_possible_cpu(cpu) { > struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu); > struct vmap_block *vb; > + unsigned long idx; > > rcu_read_lock(); Do we need to remove this rcu_read_xx() pair since it marks the RCU read-side critical section on vbq-free list? > - list_for_each_entry_rcu(vb, &vbq->free, free_list) { > + xa_for_each(&vbq->vmap_blocks, idx, vb) { > spin_lock(&vb->lock); > if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) { > unsigned long va_start = vb->va->va_start; >
On Wed, May 24 2023 at 17:25, Baoquan He wrote: > On 05/23/23 at 04:02pm, Thomas Gleixner wrote: >> _vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a >> page are left in the system. This is required due to the lazy TLB flush >> mechanism in vmalloc. >> >> This is tried to achieve by walking the per CPU free lists, but those do >> not contain fully utilized vmap blocks because they are removed from the >> free list once the blocks free space became zero. > > The problem description is not accurate. This is tried to achieve for > va associated with vmap_block by walking the per CPU free lists, those > fully utilized vmap blocks can still be flushed in __purge_vmap_area_lazy() > by calculating the [min:max] of purge_vmap_area_list, because va of > vmap_blocks will be added to purge_vmap_area_list too via vb_free(). No. The fully utilized block cannot be purged when there are still active mappings on it. Again: X = vb_alloc() ... Y = vb_alloc() vb->free -= order; if (!vb->vb_free) list_del(vb->free_list); ... vb_free(Y) vb->dirty += order; if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_ free_block(); So because $X is not yet unmapped the block is neither on the free list nor on purge_vmap_area_list. Thanks, tglx
On Wed, May 24 2023 at 17:32, Baoquan He wrote: > On 05/23/23 at 04:02pm, Thomas Gleixner wrote: >> @@ -2236,9 +2236,10 @@ static void _vm_unmap_aliases(unsigned l >> for_each_possible_cpu(cpu) { >> struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu); >> struct vmap_block *vb; >> + unsigned long idx; >> >> rcu_read_lock(); > > Do we need to remove this rcu_read_xx() pair since it marks the RCU > read-side critical section on vbq-free list? And what protects the xarray lookup? >> - list_for_each_entry_rcu(vb, &vbq->free, free_list) { >> + xa_for_each(&vbq->vmap_blocks, idx, vb) { >> spin_lock(&vb->lock); >> if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) { >> unsigned long va_start = vb->va->va_start; >>
On 05/24/23 at 11:51am, Thomas Gleixner wrote: > On Wed, May 24 2023 at 17:25, Baoquan He wrote: > > On 05/23/23 at 04:02pm, Thomas Gleixner wrote: > >> _vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a > >> page are left in the system. This is required due to the lazy TLB flush > >> mechanism in vmalloc. > >> > >> This is tried to achieve by walking the per CPU free lists, but those do > >> not contain fully utilized vmap blocks because they are removed from the > >> free list once the blocks free space became zero. > > > > The problem description is not accurate. This is tried to achieve for > > va associated with vmap_block by walking the per CPU free lists, those > > fully utilized vmap blocks can still be flushed in __purge_vmap_area_lazy() > > by calculating the [min:max] of purge_vmap_area_list, because va of > > vmap_blocks will be added to purge_vmap_area_list too via vb_free(). > > No. The fully utilized block cannot be purged when there are still > active mappings on it. Again: > > X = vb_alloc() > ... > Y = vb_alloc() > vb->free -= order; > if (!vb->vb_free) > list_del(vb->free_list); > ... > vb_free(Y) > vb->dirty += order; > if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_ > free_block(); vb_free(Y) vb->dirty += order; if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_ free_vmap_block(); -->free_vmap_area_noflush() -->merge_or_add_vmap_area(va, &purge_vmap_area_root, &purge_vmap_area_list); The last mapped region will be freed and added to purge list via vb_free(), it will be flushed with other va in purge list. When it's mapped via vb_alloc(), it's detached from vbq->free list. When it's freed via vb_alloc(), it's added to purge list, and flushed, the thing is duplicated flushing, no missing flush seen here? > > So because $X is not yet unmapped the block is neither on the free list > nor on purge_vmap_area_list. Yeah, because $X is not yet unmapped, the block could have unmapped part flushed, or unflushed. For unflushed part, it's got flushed with $X altogether in the purge list.
On 05/24/23 at 07:24pm, Baoquan He wrote: > On 05/24/23 at 11:51am, Thomas Gleixner wrote: > > On Wed, May 24 2023 at 17:25, Baoquan He wrote: > > > On 05/23/23 at 04:02pm, Thomas Gleixner wrote: > > >> _vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a > > >> page are left in the system. This is required due to the lazy TLB flush > > >> mechanism in vmalloc. > > >> > > >> This is tried to achieve by walking the per CPU free lists, but those do > > >> not contain fully utilized vmap blocks because they are removed from the > > >> free list once the blocks free space became zero. > > > > > > The problem description is not accurate. This is tried to achieve for > > > va associated with vmap_block by walking the per CPU free lists, those > > > fully utilized vmap blocks can still be flushed in __purge_vmap_area_lazy() > > > by calculating the [min:max] of purge_vmap_area_list, because va of > > > vmap_blocks will be added to purge_vmap_area_list too via vb_free(). > > > > No. The fully utilized block cannot be purged when there are still > > active mappings on it. Again: > > > > X = vb_alloc() > > ... > > Y = vb_alloc() > > vb->free -= order; > > if (!vb->vb_free) > > list_del(vb->free_list); > > ... > > vb_free(Y) > > vb->dirty += order; > > if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_ > > free_block(); > > > vb_free(Y) > vb->dirty += order; > if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_ > free_vmap_block(); > -->free_vmap_area_noflush() > -->merge_or_add_vmap_area(va, > &purge_vmap_area_root, &purge_vmap_area_list); > > The last mapped region will be freed and added to purge list via > vb_free(), it will be flushed with other va in purge list. When it's > mapped via vb_alloc(), it's detached from vbq->free list. When it's > freed via vb_alloc(), it's added to purge list, and flushed, the thing ~~~~~~~~vb_free(), typo, > is duplicated flushing, no missing flush seen here? > > > > > So because $X is not yet unmapped the block is neither on the free list > > nor on purge_vmap_area_list. > > Yeah, because $X is not yet unmapped, the block could have unmapped part > flushed, or unflushed. For unflushed part, it's got flushed with $X > altogether in the purge list. >
On Wed, May 24, 2023 at 07:24:43PM +0800, Baoquan He wrote: > On 05/24/23 at 11:51am, Thomas Gleixner wrote: > > On Wed, May 24 2023 at 17:25, Baoquan He wrote: > > > On 05/23/23 at 04:02pm, Thomas Gleixner wrote: > > >> _vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a > > >> page are left in the system. This is required due to the lazy TLB flush > > >> mechanism in vmalloc. > > >> > > >> This is tried to achieve by walking the per CPU free lists, but those do > > >> not contain fully utilized vmap blocks because they are removed from the > > >> free list once the blocks free space became zero. > > > > > > The problem description is not accurate. This is tried to achieve for > > > va associated with vmap_block by walking the per CPU free lists, those > > > fully utilized vmap blocks can still be flushed in __purge_vmap_area_lazy() > > > by calculating the [min:max] of purge_vmap_area_list, because va of > > > vmap_blocks will be added to purge_vmap_area_list too via vb_free(). > > > > No. The fully utilized block cannot be purged when there are still > > active mappings on it. Again: > > > > X = vb_alloc() > > ... > > Y = vb_alloc() > > vb->free -= order; > > if (!vb->vb_free) > > list_del(vb->free_list); > > ... > > vb_free(Y) > > vb->dirty += order; > > if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_ > > free_block(); > > > vb_free(Y) > vb->dirty += order; > if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_ > free_vmap_block(); > -->free_vmap_area_noflush() > -->merge_or_add_vmap_area(va, > &purge_vmap_area_root, &purge_vmap_area_list); > > The last mapped region will be freed and added to purge list via > vb_free(), it will be flushed with other va in purge list. When it's > mapped via vb_alloc(), it's detached from vbq->free list. When it's > freed via vb_alloc(), it's added to purge list, and flushed, the thing > is duplicated flushing, no missing flush seen here? > > > > > So because $X is not yet unmapped the block is neither on the free list > > nor on purge_vmap_area_list. > > Yeah, because $X is not yet unmapped, the block could have unmapped part > flushed, or unflushed. For unflushed part, it's got flushed with $X > altogether in the purge list. > If we gurantee that vb is fully flushed, then just do not add it it purge list: @@ -2086,12 +2090,7 @@ static void free_vmap_block(struct vmap_block *vb) BUG_ON(tmp != vb); z = addr_to_cvz(vb->va->va_start); - - spin_lock(&z->busy.lock); - unlink_va(vb->va, &z->busy.root); - spin_unlock(&z->busy.lock); - - free_vmap_area_noflush(vb->va); + free_vmap_area(vb->va); kfree_rcu(vb, rcu_head); } and directly return back into global vmap heap. -- Uladzislau Rezki
On Wed, May 24 2023 at 19:24, Baoquan He wrote: > On 05/24/23 at 11:51am, Thomas Gleixner wrote: > vb_free(Y) > vb->dirty += order; > if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_ > free_vmap_block(); > -->free_vmap_area_noflush() > -->merge_or_add_vmap_area(va, > &purge_vmap_area_root, &purge_vmap_area_list); This is irrelevant. The path is _NOT_ taken. You even copied the comment: if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_ Did you actually read what I wrote? Again: It _CANNOT_ be on the purge list because it has active mappings: 1 X = vb_alloc() ... Y = vb_alloc() vb->free -= order; // Free space goes to 0 if (!vb->vb_free) 2 list_del(vb->free_list); // Block is removed from free list ... vb_free(Y) vb->dirty += order; 3 if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_ // because #1 $X is still mapped // so block is _NOT_ freed and // _NOT_ put on the purge list 4 unmap_aliases() walk_free_list() // Does not find it because of #2 walk_purge_list() // Does not find it because of #3 If the resulting flush range is not covering the $Y TLBs then stale TLBs stay around. The xarray walk finds it and guarantees that the TLBs are gone when unmap_aliases() returns, which is the whole purpose of that function. Thanks, tglx
On Wed, May 24 2023 at 13:36, Uladzislau Rezki wrote: > On Wed, May 24, 2023 at 07:24:43PM +0800, Baoquan He wrote: >> Yeah, because $X is not yet unmapped, the block could have unmapped part >> flushed, or unflushed. For unflushed part, it's got flushed with $X >> altogether in the purge list. >> > If we gurantee that vb is fully flushed, then just do not add it it > purge list: There is no such guarantee today and there wont be one even with patch 3/6 applied because: unmap_aliases() flush(dirty_min/max) reset(dirty_min/max) vb_free(lastmapping) vunmap_range_noflush(); vb->dirty += order; if (vb->dirty == VMAP_BBMAP_BITS) free_block(vb); <-- the TLBs related to @lastmapping are _NOT_ yet flushed. Thanks, tglx
On 05/24/23 at 02:44pm, Thomas Gleixner wrote: > On Wed, May 24 2023 at 19:24, Baoquan He wrote: > > On 05/24/23 at 11:51am, Thomas Gleixner wrote: > > vb_free(Y) > > vb->dirty += order; > > if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_ > > free_vmap_block(); > > -->free_vmap_area_noflush() > > -->merge_or_add_vmap_area(va, > > &purge_vmap_area_root, &purge_vmap_area_list); > > This is irrelevant. The path is _NOT_ taken. You even copied the > comment: Ah, just copied the whole, didn't notice that. I am not a scrupulous person. > > if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_ > > Did you actually read what I wrote? > > Again: It _CANNOT_ be on the purge list because it has active mappings: > > 1 X = vb_alloc() > ... > Y = vb_alloc() > vb->free -= order; // Free space goes to 0 > if (!vb->vb_free) > 2 list_del(vb->free_list); // Block is removed from free list > ... > vb_free(Y) > vb->dirty += order; > 3 if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_ > // because #1 $X is still mapped > // so block is _NOT_ freed and > // _NOT_ put on the purge list So what if $X is unmapped via vb_free($X)? Does the condition satisfied and can the vb put into purge list? In your above example, $Y's flush is deferred, but not missed? > > 4 unmap_aliases() > walk_free_list() // Does not find it because of #2 > walk_purge_list() // Does not find it because of #3 > > If the resulting flush range is not covering the $Y TLBs then stale TLBs > stay around. OK, your mean the TLB of $Y will stay around after vb_free() until the whole vb becomes dirty, and fix that in this patch, you are right. vm_unmap_aliases() may need try to flush all unmapped ranges in this case but failed on $Y, while the page which is being reused has the old alias of $Y. My thought was attracted to the repeated flush of vmap_block va on purge list. By the way, you don't fix issue that in vm_reset_perms(), the direct map range will be accumulated with vb va and purge va and could produce flushing range including huge gap, do you still plan to fix that? I remember you said you will use array to gather ranges and flush them one by one. > > The xarray walk finds it and guarantees that the TLBs are gone when > unmap_aliases() returns, which is the whole purpose of that function. > > Thanks, > > tglx >
On 05/24/23 at 11:52am, Thomas Gleixner wrote: > On Wed, May 24 2023 at 17:32, Baoquan He wrote: > > On 05/23/23 at 04:02pm, Thomas Gleixner wrote: > >> @@ -2236,9 +2236,10 @@ static void _vm_unmap_aliases(unsigned l > >> for_each_possible_cpu(cpu) { > >> struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu); > >> struct vmap_block *vb; > >> + unsigned long idx; > >> > >> rcu_read_lock(); > > > > Do we need to remove this rcu_read_xx() pair since it marks the RCU > > read-side critical section on vbq-free list? > > And what protects the xarray lookup? We have put rcu_read_lock() pair around the xas_find(). And it will find into xarray for each iteration item. We won't lose the connection to the next element like list adding or deleting? not very sure, I could be wrong. xa_for_each() -->xa_for_each_start() -->xa_for_each_range() -->xa_find() void *xa_find(struct xarray *xa, unsigned long *indexp, unsigned long max, xa_mark_t filter) { ...... rcu_read_lock(); do { if ((__force unsigned int)filter < XA_MAX_MARKS) entry = xas_find_marked(&xas, max, filter); else entry = xas_find(&xas, max); } while (xas_retry(&xas, entry)); rcu_read_unlock(); if (entry) *indexp = xas.xa_index; return entry; } > > >> - list_for_each_entry_rcu(vb, &vbq->free, free_list) { > >> + xa_for_each(&vbq->vmap_blocks, idx, vb) { > >> spin_lock(&vb->lock); > >> if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) { > >> unsigned long va_start = vb->va->va_start; > >> >
On Wed, May 24 2023 at 21:41, Baoquan He wrote: > On 05/24/23 at 02:44pm, Thomas Gleixner wrote: >> On Wed, May 24 2023 at 19:24, Baoquan He wrote: >> Again: It _CANNOT_ be on the purge list because it has active mappings: >> >> 1 X = vb_alloc() >> ... >> Y = vb_alloc() >> vb->free -= order; // Free space goes to 0 >> if (!vb->vb_free) >> 2 list_del(vb->free_list); // Block is removed from free list >> ... >> vb_free(Y) >> vb->dirty += order; >> 3 if (vb->dirty == VMAP_BBMAP_BITS) // Condition is _false_ >> // because #1 $X is still mapped >> // so block is _NOT_ freed and >> // _NOT_ put on the purge list > > So what if $X is unmapped via vb_free($X)? Does the condition satisfied > and can the vb put into purge list? Yes, but it is _irrelevant_ for the problem at hand. > In your above example, $Y's flush is deferred, but not missed? Yes, but that violates the guarantee of vm_unmap_aliases(): * The vmap/vmalloc layer lazily flushes kernel virtual mappings primarily * to amortize TLB flushing overheads. What this means is that any page you * have now, may, in a former life, have been mapped into kernel virtual * address by the vmap layer and so there might be some CPUs with TLB entries * still referencing that page (additional to the regular 1:1 kernel mapping). * * vm_unmap_aliases flushes all such lazy mappings. After it returns, we can * be sure that none of the pages we have control over will have any aliases * from the vmap layer. >> 4 unmap_aliases() >> walk_free_list() // Does not find it because of #2 >> walk_purge_list() // Does not find it because of #3 >> >> If the resulting flush range is not covering the $Y TLBs then stale TLBs >> stay around. > > OK, your mean the TLB of $Y will stay around after vb_free() until > the whole vb becomes dirty, and fix that in this patch, you are right. > vm_unmap_aliases() may need try to flush all unmapped ranges in > this case but failed on $Y, while the page which is being reused has the > old alias of $Y. vm_unmap_aliases() _must_ guarantee that the old TLBs for $Y are gone. > My thought was attracted to the repeated flush of vmap_block va on purge > list. > > By the way, you don't fix issue that in vm_reset_perms(), the direct map > range will be accumulated with vb va and purge va and could produce > flushing range including huge gap, do you still plan to fix that? I > remember you said you will use array to gather ranges and flush them one > by one. One thing at a time. This series is a prerequisite. Thanks, tglx
On Wed, May 24 2023 at 22:10, Baoquan He wrote: > On 05/24/23 at 11:52am, Thomas Gleixner wrote: >> On Wed, May 24 2023 at 17:32, Baoquan He wrote: >> > On 05/23/23 at 04:02pm, Thomas Gleixner wrote: >> >> @@ -2236,9 +2236,10 @@ static void _vm_unmap_aliases(unsigned l >> >> for_each_possible_cpu(cpu) { >> >> struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu); >> >> struct vmap_block *vb; >> >> + unsigned long idx; >> >> >> >> rcu_read_lock(); >> > >> > Do we need to remove this rcu_read_xx() pair since it marks the RCU >> > read-side critical section on vbq-free list? >> >> And what protects the xarray lookup? > > We have put rcu_read_lock() pair around the xas_find(). And it will find > into xarray for each iteration item. We won't lose the connection to the > next element like list adding or deleting? not very sure, I could be > wrong. > > xa_for_each() > -->xa_for_each_start() > -->xa_for_each_range() > -->xa_find() I know how xarray works. No need to copy the code. rcu_read_lock() inside xa_find() protects the search, but it does not protect the returned entry, which might go away right after xa_find() does rcu_read_unlock(). Thanks, tglx
--- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2236,9 +2236,10 @@ static void _vm_unmap_aliases(unsigned l for_each_possible_cpu(cpu) { struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu); struct vmap_block *vb; + unsigned long idx; rcu_read_lock(); - list_for_each_entry_rcu(vb, &vbq->free, free_list) { + xa_for_each(&vbq->vmap_blocks, idx, vb) { spin_lock(&vb->lock); if (vb->dirty && vb->dirty != VMAP_BBMAP_BITS) { unsigned long va_start = vb->va->va_start;
_vm_unmap_aliases() is used to ensure that no unflushed TLB entries for a page are left in the system. This is required due to the lazy TLB flush mechanism in vmalloc. This is tried to achieve by walking the per CPU free lists, but those do not contain fully utilized vmap blocks because they are removed from the free list once the blocks free space became zero. So the per CPU list iteration does not find the block and if the page was mapped via such a block and the TLB has not yet been flushed, the guarantee of _vm_unmap_aliases() that there are no stale TLBs after returning is broken: x = vb_alloc() // Removes vmap_block from free list because vb->free became 0 vb_free(x) // Unmaps page and marks in dirty_min/max range // Page is reused vm_unmap_aliases() // Can't find vmap block with the dirty space -> FAIL So instead of walking the per CPU free lists, walk the per CPU xarrays which hold pointers to _all_ active blocks in the system including those removed from the free lists. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- mm/vmalloc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)