diff mbox series

mm: cachestat: avoid bogus workingset test during swapping & invalidation races

Message ID 20240314164941.580454-1-hannes@cmpxchg.org (mailing list archive)
State New
Headers show
Series mm: cachestat: avoid bogus workingset test during swapping & invalidation races | expand

Commit Message

Johannes Weiner March 14, 2024, 4:49 p.m. UTC
When cachestat against shmem races with swapping and invalidation, the
shadow entry might not exist: swapout IO is still in progress and
we're before __remove_mapping; or swapin/invalidation/swapoff has
removed the shadow from swapcache after we saw a shmem swap entry.

This will send a NULL to workingset_test_recent(). The latter purely
operates on pointer bits, so it won't crash - node 0, memcg ID 0,
eviction timestamp 0, etc. are all valid inputs - but it's a bogus
test. In theory that could result in a false "recently evicted" count.

Such a false positive wouldn't be the end of the world. But for code
clarity and (future) robustness, be explicit about this case.

Fixes: cf264e1329fb ("cachestat: implement cachestat syscall")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/filemap.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Chengming Zhou March 15, 2024, 3:16 a.m. UTC | #1
On 2024/3/15 00:49, Johannes Weiner wrote:
> When cachestat against shmem races with swapping and invalidation, the
> shadow entry might not exist: swapout IO is still in progress and
> we're before __remove_mapping; or swapin/invalidation/swapoff has
> removed the shadow from swapcache after we saw a shmem swap entry.
> 
> This will send a NULL to workingset_test_recent(). The latter purely
> operates on pointer bits, so it won't crash - node 0, memcg ID 0,
> eviction timestamp 0, etc. are all valid inputs - but it's a bogus
> test. In theory that could result in a false "recently evicted" count.

Good catch!

> 
> Such a false positive wouldn't be the end of the world. But for code
> clarity and (future) robustness, be explicit about this case.
> 
> Fixes: cf264e1329fb ("cachestat: implement cachestat syscall")
> Reported-by: Jann Horn <jannh@google.com>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/filemap.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 222adac7c9c5..a07c27df7eab 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -4199,6 +4199,9 @@ static void filemap_cachestat(struct address_space *mapping,
>  				swp_entry_t swp = radix_to_swp_entry(folio);
>  

IIUC, we should first check if it's a real swap entry using non_swap_entry(), right?
Since there maybe other types of entries in shmem. And need to get_swap_device() to
prevent concurrent swapoff here, get_shadow_from_swap_cache() won't do it for us.

Thanks.

>  				shadow = get_shadow_from_swap_cache(swp);
> +				/* can race with swapping & invalidation */
> +				if (!shadow)
> +					goto resched;
>  			}
>  #endif
>  			if (workingset_test_recent(shadow, true, &workingset))
Johannes Weiner March 15, 2024, 9:30 a.m. UTC | #2
On Fri, Mar 15, 2024 at 11:16:35AM +0800, Chengming Zhou wrote:
> On 2024/3/15 00:49, Johannes Weiner wrote:
> > When cachestat against shmem races with swapping and invalidation, the
> > shadow entry might not exist: swapout IO is still in progress and
> > we're before __remove_mapping; or swapin/invalidation/swapoff has
> > removed the shadow from swapcache after we saw a shmem swap entry.
> > 
> > This will send a NULL to workingset_test_recent(). The latter purely
> > operates on pointer bits, so it won't crash - node 0, memcg ID 0,
> > eviction timestamp 0, etc. are all valid inputs - but it's a bogus
> > test. In theory that could result in a false "recently evicted" count.
> 
> Good catch!
> 
> > 
> > Such a false positive wouldn't be the end of the world. But for code
> > clarity and (future) robustness, be explicit about this case.
> > 
> > Fixes: cf264e1329fb ("cachestat: implement cachestat syscall")
> > Reported-by: Jann Horn <jannh@google.com>
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > ---
> >  mm/filemap.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 222adac7c9c5..a07c27df7eab 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -4199,6 +4199,9 @@ static void filemap_cachestat(struct address_space *mapping,
> >  				swp_entry_t swp = radix_to_swp_entry(folio);
> >  
> 
> IIUC, we should first check if it's a real swap entry using non_swap_entry(), right?
> Since there maybe other types of entries in shmem.

Good point, it could be a poisoned entry. I'll add the
non_swap_entry() check on swp.

> And need to get_swap_device() to prevent concurrent swapoff here,
> get_shadow_from_swap_cache() won't do it for us.

We're holding rcu_read_lock() for the xarray iteration, so if we see
the swap entry in the shmem mapping, it means we beat shmem_unuse()
and swapoff hasn't run synchronize_rcu() yet.

So it's safe. But I think it could use a comment. Maybe the
documentation of get_swap_device() should mention this option too?
Chengming Zhou March 15, 2024, 9:47 a.m. UTC | #3
On 2024/3/15 17:30, Johannes Weiner wrote:
> On Fri, Mar 15, 2024 at 11:16:35AM +0800, Chengming Zhou wrote:
>> On 2024/3/15 00:49, Johannes Weiner wrote:
>>> When cachestat against shmem races with swapping and invalidation, the
>>> shadow entry might not exist: swapout IO is still in progress and
>>> we're before __remove_mapping; or swapin/invalidation/swapoff has
>>> removed the shadow from swapcache after we saw a shmem swap entry.
>>>
>>> This will send a NULL to workingset_test_recent(). The latter purely
>>> operates on pointer bits, so it won't crash - node 0, memcg ID 0,
>>> eviction timestamp 0, etc. are all valid inputs - but it's a bogus
>>> test. In theory that could result in a false "recently evicted" count.
>>
>> Good catch!
>>
>>>
>>> Such a false positive wouldn't be the end of the world. But for code
>>> clarity and (future) robustness, be explicit about this case.
>>>
>>> Fixes: cf264e1329fb ("cachestat: implement cachestat syscall")
>>> Reported-by: Jann Horn <jannh@google.com>
>>> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
>>> ---
>>>  mm/filemap.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>> index 222adac7c9c5..a07c27df7eab 100644
>>> --- a/mm/filemap.c
>>> +++ b/mm/filemap.c
>>> @@ -4199,6 +4199,9 @@ static void filemap_cachestat(struct address_space *mapping,
>>>  				swp_entry_t swp = radix_to_swp_entry(folio);
>>>  
>>
>> IIUC, we should first check if it's a real swap entry using non_swap_entry(), right?
>> Since there maybe other types of entries in shmem.
> 
> Good point, it could be a poisoned entry. I'll add the
> non_swap_entry() check on swp.
> 
>> And need to get_swap_device() to prevent concurrent swapoff here,
>> get_shadow_from_swap_cache() won't do it for us.
> 
> We're holding rcu_read_lock() for the xarray iteration, so if we see
> the swap entry in the shmem mapping, it means we beat shmem_unuse()
> and swapoff hasn't run synchronize_rcu() yet.

Ah, you are right, so it's safe.

> 
> So it's safe. But I think it could use a comment. Maybe the
> documentation of get_swap_device() should mention this option too?
diff mbox series

Patch

diff --git a/mm/filemap.c b/mm/filemap.c
index 222adac7c9c5..a07c27df7eab 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4199,6 +4199,9 @@  static void filemap_cachestat(struct address_space *mapping,
 				swp_entry_t swp = radix_to_swp_entry(folio);
 
 				shadow = get_shadow_from_swap_cache(swp);
+				/* can race with swapping & invalidation */
+				if (!shadow)
+					goto resched;
 			}
 #endif
 			if (workingset_test_recent(shadow, true, &workingset))