diff mbox series

[1/4] mm: allow guard regions in file-backed and read-only mappings

Message ID d885cb259174736c2830a5dfe07f81b214ef3faa.1739469950.git.lorenzo.stoakes@oracle.com (mailing list archive)
State New
Headers show
Series mm: permit guard regions for file-backed/shmem mappings | expand

Commit Message

Lorenzo Stoakes Feb. 13, 2025, 6:17 p.m. UTC
There is no reason to disallow guard regions in file-backed mappings -
readahead and fault-around both function correctly in the presence of PTE
markers, equally other operations relating to memory-mapped files function
correctly.

Additionally, read-only mappings if introducing guard-regions, only
restrict the mapping further, which means there is no violation of any
access rights by permitting this to be so.

Removing this restriction allows for read-only mapped files (such as
executable files) correctly which would otherwise not be permitted.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/madvise.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

Comments

Vlastimil Babka Feb. 18, 2025, 2:15 p.m. UTC | #1
On 2/13/25 19:17, Lorenzo Stoakes wrote:
> There is no reason to disallow guard regions in file-backed mappings -
> readahead and fault-around both function correctly in the presence of PTE
> markers, equally other operations relating to memory-mapped files function
> correctly.
> 
> Additionally, read-only mappings if introducing guard-regions, only
> restrict the mapping further, which means there is no violation of any
> access rights by permitting this to be so.
> 
> Removing this restriction allows for read-only mapped files (such as
> executable files) correctly which would otherwise not be permitted.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/madvise.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 6ecead476a80..e01e93e179a8 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked)
>  	if (!allow_locked)
>  		disallowed |= VM_LOCKED;
>  
> -	if (!vma_is_anonymous(vma))
> -		return false;
> -
> -	if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE)
> -		return false;
> -
> -	return true;
> +	return !(vma->vm_flags & disallowed);
>  }
>  
>  static bool is_guard_pte_marker(pte_t ptent)
David Hildenbrand Feb. 18, 2025, 4:01 p.m. UTC | #2
On 13.02.25 19:17, Lorenzo Stoakes wrote:
> There is no reason to disallow guard regions in file-backed mappings -
> readahead and fault-around both function correctly in the presence of PTE
> markers, equally other operations relating to memory-mapped files function
> correctly.
> 
> Additionally, read-only mappings if introducing guard-regions, only
> restrict the mapping further, which means there is no violation of any
> access rights by permitting this to be so.
> 
> Removing this restriction allows for read-only mapped files (such as
> executable files) correctly which would otherwise not be permitted.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>   mm/madvise.c | 8 +-------
>   1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 6ecead476a80..e01e93e179a8 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked)
>   	if (!allow_locked)
>   		disallowed |= VM_LOCKED;
>   
> -	if (!vma_is_anonymous(vma))
> -		return false;
> -
> -	if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE)
> -		return false;
> -
> -	return true;
> +	return !(vma->vm_flags & disallowed);
>   }
>   
>   static bool is_guard_pte_marker(pte_t ptent)

Acked-by: David Hildenbrand <david@redhat.com>

I assume these markers cannot completely prevent us from allocating 
pages/folios for these underlying file/pageache ranges of these markers 
in case of shmem during page faults, right?
Lorenzo Stoakes Feb. 18, 2025, 4:12 p.m. UTC | #3
On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote:
> On 13.02.25 19:17, Lorenzo Stoakes wrote:
> > There is no reason to disallow guard regions in file-backed mappings -
> > readahead and fault-around both function correctly in the presence of PTE
> > markers, equally other operations relating to memory-mapped files function
> > correctly.
> >
> > Additionally, read-only mappings if introducing guard-regions, only
> > restrict the mapping further, which means there is no violation of any
> > access rights by permitting this to be so.
> >
> > Removing this restriction allows for read-only mapped files (such as
> > executable files) correctly which would otherwise not be permitted.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> >   mm/madvise.c | 8 +-------
> >   1 file changed, 1 insertion(+), 7 deletions(-)
> >
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 6ecead476a80..e01e93e179a8 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked)
> >   	if (!allow_locked)
> >   		disallowed |= VM_LOCKED;
> > -	if (!vma_is_anonymous(vma))
> > -		return false;
> > -
> > -	if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE)
> > -		return false;
> > -
> > -	return true;
> > +	return !(vma->vm_flags & disallowed);
> >   }
> >   static bool is_guard_pte_marker(pte_t ptent)
>
> Acked-by: David Hildenbrand <david@redhat.com>

Thanks!

>
> I assume these markers cannot completely prevent us from allocating
> pages/folios for these underlying file/pageache ranges of these markers in
> case of shmem during page faults, right?

If the markers are in place, then page faulting will result in a
segfault. If we faulted in a shmem page then installed markers (which would
zap the range), then the page cache will be populated, but obviously
subject to standard reclaim.

If we perform synchronous readahead prior to a guard region that includes
(partially or fully) a guard region we might major fault entries into the
page cache that are then not accessable _from that mapping_, this is rather
unavoidable as this doesn't account for page table mappings and should be
largely trivial overhead (also these folios are reclaimable).

>
> --
> Cheers,
>
> David / dhildenb
>
David Hildenbrand Feb. 18, 2025, 4:17 p.m. UTC | #4
On 18.02.25 17:12, Lorenzo Stoakes wrote:
> On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote:
>> On 13.02.25 19:17, Lorenzo Stoakes wrote:
>>> There is no reason to disallow guard regions in file-backed mappings -
>>> readahead and fault-around both function correctly in the presence of PTE
>>> markers, equally other operations relating to memory-mapped files function
>>> correctly.
>>>
>>> Additionally, read-only mappings if introducing guard-regions, only
>>> restrict the mapping further, which means there is no violation of any
>>> access rights by permitting this to be so.
>>>
>>> Removing this restriction allows for read-only mapped files (such as
>>> executable files) correctly which would otherwise not be permitted.
>>>
>>> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>> ---
>>>    mm/madvise.c | 8 +-------
>>>    1 file changed, 1 insertion(+), 7 deletions(-)
>>>
>>> diff --git a/mm/madvise.c b/mm/madvise.c
>>> index 6ecead476a80..e01e93e179a8 100644
>>> --- a/mm/madvise.c
>>> +++ b/mm/madvise.c
>>> @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked)
>>>    	if (!allow_locked)
>>>    		disallowed |= VM_LOCKED;
>>> -	if (!vma_is_anonymous(vma))
>>> -		return false;
>>> -
>>> -	if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE)
>>> -		return false;
>>> -
>>> -	return true;
>>> +	return !(vma->vm_flags & disallowed);
>>>    }
>>>    static bool is_guard_pte_marker(pte_t ptent)
>>
>> Acked-by: David Hildenbrand <david@redhat.com>
> 
> Thanks!
> 
>>
>> I assume these markers cannot completely prevent us from allocating
>> pages/folios for these underlying file/pageache ranges of these markers in
>> case of shmem during page faults, right?
> 
> If the markers are in place, then page faulting will result in a
> segfault. If we faulted in a shmem page then installed markers (which would
> zap the range), then the page cache will be populated, but obviously
> subject to standard reclaim.

Well, yes, (a) if there is swap and (b), if the noswap option was not 
specified for tmpfs.

Okay, so installing a guard entry might require punshing a hole to get 
rid of any already-existing memory. But readahead (below) might mess it up.

> 
> If we perform synchronous readahead prior to a guard region that includes
> (partially or fully) a guard region we might major fault entries into the
> page cache that are then not accessable _from that mapping_, this is rather
> unavoidable as this doesn't account for page table mappings and should be
> largely trivial overhead (also these folios are reclaimable).

Right, that's what I had in mind: assume I have a single marker in a 
PMD, shmem might allocate a PMD THP to back that region, ignoring the 
marker hint. (so I think)

Thanks!
Lorenzo Stoakes Feb. 18, 2025, 4:21 p.m. UTC | #5
On Tue, Feb 18, 2025 at 05:17:20PM +0100, David Hildenbrand wrote:
> On 18.02.25 17:12, Lorenzo Stoakes wrote:
> > On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote:
> > > On 13.02.25 19:17, Lorenzo Stoakes wrote:
> > > > There is no reason to disallow guard regions in file-backed mappings -
> > > > readahead and fault-around both function correctly in the presence of PTE
> > > > markers, equally other operations relating to memory-mapped files function
> > > > correctly.
> > > >
> > > > Additionally, read-only mappings if introducing guard-regions, only
> > > > restrict the mapping further, which means there is no violation of any
> > > > access rights by permitting this to be so.
> > > >
> > > > Removing this restriction allows for read-only mapped files (such as
> > > > executable files) correctly which would otherwise not be permitted.
> > > >
> > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > > ---
> > > >    mm/madvise.c | 8 +-------
> > > >    1 file changed, 1 insertion(+), 7 deletions(-)
> > > >
> > > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > > index 6ecead476a80..e01e93e179a8 100644
> > > > --- a/mm/madvise.c
> > > > +++ b/mm/madvise.c
> > > > @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked)
> > > >    	if (!allow_locked)
> > > >    		disallowed |= VM_LOCKED;
> > > > -	if (!vma_is_anonymous(vma))
> > > > -		return false;
> > > > -
> > > > -	if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE)
> > > > -		return false;
> > > > -
> > > > -	return true;
> > > > +	return !(vma->vm_flags & disallowed);
> > > >    }
> > > >    static bool is_guard_pte_marker(pte_t ptent)
> > >
> > > Acked-by: David Hildenbrand <david@redhat.com>
> >
> > Thanks!
> >
> > >
> > > I assume these markers cannot completely prevent us from allocating
> > > pages/folios for these underlying file/pageache ranges of these markers in
> > > case of shmem during page faults, right?
> >
> > If the markers are in place, then page faulting will result in a
> > segfault. If we faulted in a shmem page then installed markers (which would
> > zap the range), then the page cache will be populated, but obviously
> > subject to standard reclaim.
>
> Well, yes, (a) if there is swap and (b), if the noswap option was not
> specified for tmpfs.
>

Right, yeah if you don't have it set up such that dropping a reference to the
folio doesn't drop the page altogether.

I think this matches expectation though in that you'd get the same results from
an MADV_DONTNEED followed by faulting the page again.

> Okay, so installing a guard entry might require punshing a hole to get rid
> of any already-existing memory. But readahead (below) might mess it up.

Only if you are so concerned about avoiding the page cache being populated there
that you want to do this :)

Readahead I think will not readahead into a holepunched region as the hole
punching extends to the fs layer _I believe_ I have not checked the code for
this, but I believe it actually changes the underlying file too right to say
'this part of the file is empty'?

(I did explicitly test hole punching with guard regions btw, by the by :)

>
> >
> > If we perform synchronous readahead prior to a guard region that includes
> > (partially or fully) a guard region we might major fault entries into the
> > page cache that are then not accessable _from that mapping_, this is rather
> > unavoidable as this doesn't account for page table mappings and should be
> > largely trivial overhead (also these folios are reclaimable).
>
> Right, that's what I had in mind: assume I have a single marker in a PMD,
> shmem might allocate a PMD THP to back that region, ignoring the marker
> hint. (so I think)
>
> Thanks!
>
> --
> Cheers,
>
> David / dhildenb
>
>
David Hildenbrand Feb. 18, 2025, 4:27 p.m. UTC | #6
On 18.02.25 17:21, Lorenzo Stoakes wrote:
> On Tue, Feb 18, 2025 at 05:17:20PM +0100, David Hildenbrand wrote:
>> On 18.02.25 17:12, Lorenzo Stoakes wrote:
>>> On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote:
>>>> On 13.02.25 19:17, Lorenzo Stoakes wrote:
>>>>> There is no reason to disallow guard regions in file-backed mappings -
>>>>> readahead and fault-around both function correctly in the presence of PTE
>>>>> markers, equally other operations relating to memory-mapped files function
>>>>> correctly.
>>>>>
>>>>> Additionally, read-only mappings if introducing guard-regions, only
>>>>> restrict the mapping further, which means there is no violation of any
>>>>> access rights by permitting this to be so.
>>>>>
>>>>> Removing this restriction allows for read-only mapped files (such as
>>>>> executable files) correctly which would otherwise not be permitted.
>>>>>
>>>>> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>>>> ---
>>>>>     mm/madvise.c | 8 +-------
>>>>>     1 file changed, 1 insertion(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/mm/madvise.c b/mm/madvise.c
>>>>> index 6ecead476a80..e01e93e179a8 100644
>>>>> --- a/mm/madvise.c
>>>>> +++ b/mm/madvise.c
>>>>> @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked)
>>>>>     	if (!allow_locked)
>>>>>     		disallowed |= VM_LOCKED;
>>>>> -	if (!vma_is_anonymous(vma))
>>>>> -		return false;
>>>>> -
>>>>> -	if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE)
>>>>> -		return false;
>>>>> -
>>>>> -	return true;
>>>>> +	return !(vma->vm_flags & disallowed);
>>>>>     }
>>>>>     static bool is_guard_pte_marker(pte_t ptent)
>>>>
>>>> Acked-by: David Hildenbrand <david@redhat.com>
>>>
>>> Thanks!
>>>
>>>>
>>>> I assume these markers cannot completely prevent us from allocating
>>>> pages/folios for these underlying file/pageache ranges of these markers in
>>>> case of shmem during page faults, right?
>>>
>>> If the markers are in place, then page faulting will result in a
>>> segfault. If we faulted in a shmem page then installed markers (which would
>>> zap the range), then the page cache will be populated, but obviously
>>> subject to standard reclaim.
>>
>> Well, yes, (a) if there is swap and (b), if the noswap option was not
>> specified for tmpfs.
>>
> 
> Right, yeah if you don't have it set up such that dropping a reference to the
> folio doesn't drop the page altogether.
> 
> I think this matches expectation though in that you'd get the same results from
> an MADV_DONTNEED followed by faulting the page again.

It might make sense to document that: installing a guard behaves just 
like MADV_DONTNEED; in case of a file, that means that the pagecache is 
left untouched.

> 
>> Okay, so installing a guard entry might require punshing a hole to get rid
>> of any already-existing memory. But readahead (below) might mess it up.
> 
> Only if you are so concerned about avoiding the page cache being populated there
> that you want to do this :)
> 
> Readahead I think will not readahead into a holepunched region as the hole
> punching extends to the fs layer _I believe_ I have not checked the code for
> this, but I believe it actually changes the underlying file too right to say
> 'this part of the file is empty'?

Well, we are talking about shmem here ... not your ordinary fs backed by 
an actual file :)
Lorenzo Stoakes Feb. 18, 2025, 4:49 p.m. UTC | #7
On Tue, Feb 18, 2025 at 05:27:24PM +0100, David Hildenbrand wrote:
> On 18.02.25 17:21, Lorenzo Stoakes wrote:
> > On Tue, Feb 18, 2025 at 05:17:20PM +0100, David Hildenbrand wrote:
> > > On 18.02.25 17:12, Lorenzo Stoakes wrote:
> > > > On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote:
> > > > > On 13.02.25 19:17, Lorenzo Stoakes wrote:
> > > > > > There is no reason to disallow guard regions in file-backed mappings -
> > > > > > readahead and fault-around both function correctly in the presence of PTE
> > > > > > markers, equally other operations relating to memory-mapped files function
> > > > > > correctly.
> > > > > >
> > > > > > Additionally, read-only mappings if introducing guard-regions, only
> > > > > > restrict the mapping further, which means there is no violation of any
> > > > > > access rights by permitting this to be so.
> > > > > >
> > > > > > Removing this restriction allows for read-only mapped files (such as
> > > > > > executable files) correctly which would otherwise not be permitted.
> > > > > >
> > > > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > > > > ---
> > > > > >     mm/madvise.c | 8 +-------
> > > > > >     1 file changed, 1 insertion(+), 7 deletions(-)
> > > > > >
> > > > > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > > > > index 6ecead476a80..e01e93e179a8 100644
> > > > > > --- a/mm/madvise.c
> > > > > > +++ b/mm/madvise.c
> > > > > > @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked)
> > > > > >     	if (!allow_locked)
> > > > > >     		disallowed |= VM_LOCKED;
> > > > > > -	if (!vma_is_anonymous(vma))
> > > > > > -		return false;
> > > > > > -
> > > > > > -	if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE)
> > > > > > -		return false;
> > > > > > -
> > > > > > -	return true;
> > > > > > +	return !(vma->vm_flags & disallowed);
> > > > > >     }
> > > > > >     static bool is_guard_pte_marker(pte_t ptent)
> > > > >
> > > > > Acked-by: David Hildenbrand <david@redhat.com>
> > > >
> > > > Thanks!
> > > >
> > > > >
> > > > > I assume these markers cannot completely prevent us from allocating
> > > > > pages/folios for these underlying file/pageache ranges of these markers in
> > > > > case of shmem during page faults, right?
> > > >
> > > > If the markers are in place, then page faulting will result in a
> > > > segfault. If we faulted in a shmem page then installed markers (which would
> > > > zap the range), then the page cache will be populated, but obviously
> > > > subject to standard reclaim.
> > >
> > > Well, yes, (a) if there is swap and (b), if the noswap option was not
> > > specified for tmpfs.
> > >
> >
> > Right, yeah if you don't have it set up such that dropping a reference to the
> > folio doesn't drop the page altogether.
> >
> > I think this matches expectation though in that you'd get the same results from
> > an MADV_DONTNEED followed by faulting the page again.
>
> It might make sense to document that: installing a guard behaves just like
> MADV_DONTNEED; in case of a file, that means that the pagecache is left
> untouched.

More docs noooo! :P I will update the man pages when this is more obviously
heading for landing in 6.15 accordingly.

Current man page documentation on this is:

'If the region maps memory pages those mappings will be replaced as part of
the operation'

I think something like:

'If the region maps pages those mappings will be replaced as part of the
operation. When guard regions are removed via MADV_GUARD_REMOVE, faulting
in the page will behave as if that region had MADV_DONTNEED applied to it,
that is anonymous ranges will be backed by newly allocated zeroed pages and
file-backed ranges will be backed by the underlying file pages.'

Probably something less wordy than this...

>
> >
> > > Okay, so installing a guard entry might require punshing a hole to get rid
> > > of any already-existing memory. But readahead (below) might mess it up.
> >
> > Only if you are so concerned about avoiding the page cache being populated there
> > that you want to do this :)
> >
> > Readahead I think will not readahead into a holepunched region as the hole
> > punching extends to the fs layer _I believe_ I have not checked the code for
> > this, but I believe it actually changes the underlying file too right to say
> > 'this part of the file is empty'?
>
> Well, we are talking about shmem here ... not your ordinary fs backed by an
> actual file :)

I am talking about both, I multitask ;)

>
> --
> Cheers,
>
> David / dhildenb
>
David Hildenbrand Feb. 18, 2025, 5 p.m. UTC | #8
On 18.02.25 17:49, Lorenzo Stoakes wrote:
> On Tue, Feb 18, 2025 at 05:27:24PM +0100, David Hildenbrand wrote:
>> On 18.02.25 17:21, Lorenzo Stoakes wrote:
>>> On Tue, Feb 18, 2025 at 05:17:20PM +0100, David Hildenbrand wrote:
>>>> On 18.02.25 17:12, Lorenzo Stoakes wrote:
>>>>> On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote:
>>>>>> On 13.02.25 19:17, Lorenzo Stoakes wrote:
>>>>>>> There is no reason to disallow guard regions in file-backed mappings -
>>>>>>> readahead and fault-around both function correctly in the presence of PTE
>>>>>>> markers, equally other operations relating to memory-mapped files function
>>>>>>> correctly.
>>>>>>>
>>>>>>> Additionally, read-only mappings if introducing guard-regions, only
>>>>>>> restrict the mapping further, which means there is no violation of any
>>>>>>> access rights by permitting this to be so.
>>>>>>>
>>>>>>> Removing this restriction allows for read-only mapped files (such as
>>>>>>> executable files) correctly which would otherwise not be permitted.
>>>>>>>
>>>>>>> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>>>>>> ---
>>>>>>>      mm/madvise.c | 8 +-------
>>>>>>>      1 file changed, 1 insertion(+), 7 deletions(-)
>>>>>>>
>>>>>>> diff --git a/mm/madvise.c b/mm/madvise.c
>>>>>>> index 6ecead476a80..e01e93e179a8 100644
>>>>>>> --- a/mm/madvise.c
>>>>>>> +++ b/mm/madvise.c
>>>>>>> @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked)
>>>>>>>      	if (!allow_locked)
>>>>>>>      		disallowed |= VM_LOCKED;
>>>>>>> -	if (!vma_is_anonymous(vma))
>>>>>>> -		return false;
>>>>>>> -
>>>>>>> -	if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE)
>>>>>>> -		return false;
>>>>>>> -
>>>>>>> -	return true;
>>>>>>> +	return !(vma->vm_flags & disallowed);
>>>>>>>      }
>>>>>>>      static bool is_guard_pte_marker(pte_t ptent)
>>>>>>
>>>>>> Acked-by: David Hildenbrand <david@redhat.com>
>>>>>
>>>>> Thanks!
>>>>>
>>>>>>
>>>>>> I assume these markers cannot completely prevent us from allocating
>>>>>> pages/folios for these underlying file/pageache ranges of these markers in
>>>>>> case of shmem during page faults, right?
>>>>>
>>>>> If the markers are in place, then page faulting will result in a
>>>>> segfault. If we faulted in a shmem page then installed markers (which would
>>>>> zap the range), then the page cache will be populated, but obviously
>>>>> subject to standard reclaim.
>>>>
>>>> Well, yes, (a) if there is swap and (b), if the noswap option was not
>>>> specified for tmpfs.
>>>>
>>>
>>> Right, yeah if you don't have it set up such that dropping a reference to the
>>> folio doesn't drop the page altogether.
>>>
>>> I think this matches expectation though in that you'd get the same results from
>>> an MADV_DONTNEED followed by faulting the page again.
>>
>> It might make sense to document that: installing a guard behaves just like
>> MADV_DONTNEED; in case of a file, that means that the pagecache is left
>> untouched.
> 
> More docs noooo! :P I will update the man pages when this is more obviously
> heading for landing in 6.15 accordingly.
> 
> Current man page documentation on this is:
> 
> 'If the region maps memory pages those mappings will be replaced as part of
> the operation'
> 
> I think something like:
> 
> 'If the region maps pages those mappings will be replaced as part of the
> operation. When guard regions are removed via MADV_GUARD_REMOVE, faulting
> in the page will behave as if that region had MADV_DONTNEED applied to it,
> that is anonymous ranges will be backed by newly allocated zeroed pages and
> file-backed ranges will be backed by the underlying file pages.'
> 
> Probably something less wordy than this...

Yeah, but sounds good to me.

> 
>>
>>>
>>>> Okay, so installing a guard entry might require punshing a hole to get rid
>>>> of any already-existing memory. But readahead (below) might mess it up.
>>>
>>> Only if you are so concerned about avoiding the page cache being populated there
>>> that you want to do this :)
>>>
>>> Readahead I think will not readahead into a holepunched region as the hole
>>> punching extends to the fs layer _I believe_ I have not checked the code for
>>> this, but I believe it actually changes the underlying file too right to say
>>> 'this part of the file is empty'?
>>
>> Well, we are talking about shmem here ... not your ordinary fs backed by an
>> actual file :)
> 
> I am talking about both, I multitask ;)

For !shmem, we should indeed not be messing with a sparse file structure.
Lorenzo Stoakes Feb. 18, 2025, 5:04 p.m. UTC | #9
On Tue, Feb 18, 2025 at 06:00:36PM +0100, David Hildenbrand wrote:
> On 18.02.25 17:49, Lorenzo Stoakes wrote:
> > On Tue, Feb 18, 2025 at 05:27:24PM +0100, David Hildenbrand wrote:
> > > On 18.02.25 17:21, Lorenzo Stoakes wrote:
> > > > On Tue, Feb 18, 2025 at 05:17:20PM +0100, David Hildenbrand wrote:
> > > > > On 18.02.25 17:12, Lorenzo Stoakes wrote:
> > > > > > On Tue, Feb 18, 2025 at 05:01:16PM +0100, David Hildenbrand wrote:
> > > > > > > On 13.02.25 19:17, Lorenzo Stoakes wrote:
> > > > > > > > There is no reason to disallow guard regions in file-backed mappings -
> > > > > > > > readahead and fault-around both function correctly in the presence of PTE
> > > > > > > > markers, equally other operations relating to memory-mapped files function
> > > > > > > > correctly.
> > > > > > > >
> > > > > > > > Additionally, read-only mappings if introducing guard-regions, only
> > > > > > > > restrict the mapping further, which means there is no violation of any
> > > > > > > > access rights by permitting this to be so.
> > > > > > > >
> > > > > > > > Removing this restriction allows for read-only mapped files (such as
> > > > > > > > executable files) correctly which would otherwise not be permitted.
> > > > > > > >
> > > > > > > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > > > > > > ---
> > > > > > > >      mm/madvise.c | 8 +-------
> > > > > > > >      1 file changed, 1 insertion(+), 7 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/mm/madvise.c b/mm/madvise.c
> > > > > > > > index 6ecead476a80..e01e93e179a8 100644
> > > > > > > > --- a/mm/madvise.c
> > > > > > > > +++ b/mm/madvise.c
> > > > > > > > @@ -1051,13 +1051,7 @@ static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked)
> > > > > > > >      	if (!allow_locked)
> > > > > > > >      		disallowed |= VM_LOCKED;
> > > > > > > > -	if (!vma_is_anonymous(vma))
> > > > > > > > -		return false;
> > > > > > > > -
> > > > > > > > -	if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE)
> > > > > > > > -		return false;
> > > > > > > > -
> > > > > > > > -	return true;
> > > > > > > > +	return !(vma->vm_flags & disallowed);
> > > > > > > >      }
> > > > > > > >      static bool is_guard_pte_marker(pte_t ptent)
> > > > > > >
> > > > > > > Acked-by: David Hildenbrand <david@redhat.com>
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > >
> > > > > > > I assume these markers cannot completely prevent us from allocating
> > > > > > > pages/folios for these underlying file/pageache ranges of these markers in
> > > > > > > case of shmem during page faults, right?
> > > > > >
> > > > > > If the markers are in place, then page faulting will result in a
> > > > > > segfault. If we faulted in a shmem page then installed markers (which would
> > > > > > zap the range), then the page cache will be populated, but obviously
> > > > > > subject to standard reclaim.
> > > > >
> > > > > Well, yes, (a) if there is swap and (b), if the noswap option was not
> > > > > specified for tmpfs.
> > > > >
> > > >
> > > > Right, yeah if you don't have it set up such that dropping a reference to the
> > > > folio doesn't drop the page altogether.
> > > >
> > > > I think this matches expectation though in that you'd get the same results from
> > > > an MADV_DONTNEED followed by faulting the page again.
> > >
> > > It might make sense to document that: installing a guard behaves just like
> > > MADV_DONTNEED; in case of a file, that means that the pagecache is left
> > > untouched.
> >
> > More docs noooo! :P I will update the man pages when this is more obviously
> > heading for landing in 6.15 accordingly.
> >
> > Current man page documentation on this is:
> >
> > 'If the region maps memory pages those mappings will be replaced as part of
> > the operation'
> >
> > I think something like:
> >
> > 'If the region maps pages those mappings will be replaced as part of the
> > operation. When guard regions are removed via MADV_GUARD_REMOVE, faulting
> > in the page will behave as if that region had MADV_DONTNEED applied to it,
> > that is anonymous ranges will be backed by newly allocated zeroed pages and
> > file-backed ranges will be backed by the underlying file pages.'
> >
> > Probably something less wordy than this...
>
> Yeah, but sounds good to me.
>
> >
> > >
> > > >
> > > > > Okay, so installing a guard entry might require punshing a hole to get rid
> > > > > of any already-existing memory. But readahead (below) might mess it up.
> > > >
> > > > Only if you are so concerned about avoiding the page cache being populated there
> > > > that you want to do this :)
> > > >
> > > > Readahead I think will not readahead into a holepunched region as the hole
> > > > punching extends to the fs layer _I believe_ I have not checked the code for
> > > > this, but I believe it actually changes the underlying file too right to say
> > > > 'this part of the file is empty'?
> > >
> > > Well, we are talking about shmem here ... not your ordinary fs backed by an
> > > actual file :)
> >
> > I am talking about both, I multitask ;)
>
> For !shmem, we should indeed not be messing with a sparse file structure.

Ah right I get you, sorry missed your point here. Yeah MADV_REMOVE for shmem
will just explicitly drop the pages which doesn't have any underlying file to
impact as with actually file-backed memory.

>
>
> --
> Cheers,
>
> David / dhildenb
>
diff mbox series

Patch

diff --git a/mm/madvise.c b/mm/madvise.c
index 6ecead476a80..e01e93e179a8 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1051,13 +1051,7 @@  static bool is_valid_guard_vma(struct vm_area_struct *vma, bool allow_locked)
 	if (!allow_locked)
 		disallowed |= VM_LOCKED;
 
-	if (!vma_is_anonymous(vma))
-		return false;
-
-	if ((vma->vm_flags & (VM_MAYWRITE | disallowed)) != VM_MAYWRITE)
-		return false;
-
-	return true;
+	return !(vma->vm_flags & disallowed);
 }
 
 static bool is_guard_pte_marker(pte_t ptent)