diff mbox series

[v1] mm/madvise: Always set ptes via arch helpers

Message ID 20250307123307.262298-1-ryan.roberts@arm.com (mailing list archive)
State New
Headers show
Series [v1] mm/madvise: Always set ptes via arch helpers | expand

Commit Message

Ryan Roberts March 7, 2025, 12:33 p.m. UTC
Instead of writing a pte directly into the table, use the set_pte_at()
helper, which gives the arch visibility of the change.

In this instance we are guaranteed that the pte was originally none and
is being modified to a not-present pte, so there was unlikely to be a
bug in practice (at least not on arm64). But it's bad practice to write
the page table memory directly without arch involvement.

Cc: <stable@vger.kernel.org>
Fixes: 662df3e5c376 ("mm: madvise: implement lightweight guard page mechanism")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 mm/madvise.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--
2.43.0

Comments

Lorenzo Stoakes March 7, 2025, 1:04 p.m. UTC | #1
On Fri, Mar 07, 2025 at 12:33:06PM +0000, Ryan Roberts wrote:
> Instead of writing a pte directly into the table, use the set_pte_at()
> helper, which gives the arch visibility of the change.
>
> In this instance we are guaranteed that the pte was originally none and
> is being modified to a not-present pte, so there was unlikely to be a
> bug in practice (at least not on arm64). But it's bad practice to write
> the page table memory directly without arch involvement.
>
> Cc: <stable@vger.kernel.org>
> Fixes: 662df3e5c376 ("mm: madvise: implement lightweight guard page mechanism")
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  mm/madvise.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 388dc289b5d1..6170f4acc14f 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1101,7 +1101,7 @@ static int guard_install_set_pte(unsigned long addr, unsigned long next,
>  	unsigned long *nr_pages = (unsigned long *)walk->private;
>
>  	/* Simply install a PTE marker, this causes segfault on access. */
> -	*ptep = make_pte_marker(PTE_MARKER_GUARD);
> +	set_pte_at(walk->mm, addr, ptep, make_pte_marker(PTE_MARKER_GUARD));

I agree with you, but I think perhaps the arg name here is misleading :) If
you look at mm/pagewalk.c and specifically, in walk_pte_range_inner():

		if (ops->install_pte && pte_none(ptep_get(pte))) {
			pte_t new_pte;

			err = ops->install_pte(addr, addr + PAGE_SIZE, &new_pte,
					       walk);
			if (err)
				break;

			set_pte_at(walk->mm, addr, pte, new_pte);

			...
		}

So the ptep being assigned here is a stack value, new_pte, which we simply
assign to, and _then_ the page walker code handles the set_pte_at() for us.

So we are indeed doing the right thing here, just in a different place :P

>  	(*nr_pages)++;
>
>  	return 0;
> --
> 2.43.0
>
Ryan Roberts March 7, 2025, 1:42 p.m. UTC | #2
On 07/03/2025 13:04, Lorenzo Stoakes wrote:
> On Fri, Mar 07, 2025 at 12:33:06PM +0000, Ryan Roberts wrote:
>> Instead of writing a pte directly into the table, use the set_pte_at()
>> helper, which gives the arch visibility of the change.
>>
>> In this instance we are guaranteed that the pte was originally none and
>> is being modified to a not-present pte, so there was unlikely to be a
>> bug in practice (at least not on arm64). But it's bad practice to write
>> the page table memory directly without arch involvement.
>>
>> Cc: <stable@vger.kernel.org>
>> Fixes: 662df3e5c376 ("mm: madvise: implement lightweight guard page mechanism")
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  mm/madvise.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 388dc289b5d1..6170f4acc14f 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -1101,7 +1101,7 @@ static int guard_install_set_pte(unsigned long addr, unsigned long next,
>>  	unsigned long *nr_pages = (unsigned long *)walk->private;
>>
>>  	/* Simply install a PTE marker, this causes segfault on access. */
>> -	*ptep = make_pte_marker(PTE_MARKER_GUARD);
>> +	set_pte_at(walk->mm, addr, ptep, make_pte_marker(PTE_MARKER_GUARD));
> 
> I agree with you, but I think perhaps the arg name here is misleading :) If
> you look at mm/pagewalk.c and specifically, in walk_pte_range_inner():
> 
> 		if (ops->install_pte && pte_none(ptep_get(pte))) {
> 			pte_t new_pte;
> 
> 			err = ops->install_pte(addr, addr + PAGE_SIZE, &new_pte,
> 					       walk);
> 			if (err)
> 				break;
> 
> 			set_pte_at(walk->mm, addr, pte, new_pte);
> 
> 			...
> 		}
> 
> So the ptep being assigned here is a stack value, new_pte, which we simply
> assign to, and _then_ the page walker code handles the set_pte_at() for us.
> 
> So we are indeed doing the right thing here, just in a different place :P

Ahh my bad. In that case, please ignore the patch.

But out of interest, why are you doing it like this? I find it a bit confusing
as all the other ops (e.g. pte_entry()) work directly on the pgtable's pte
without the intermediate.

Thanks,
Ryan

> 
>>  	(*nr_pages)++;
>>
>>  	return 0;
>> --
>> 2.43.0
>>
Lorenzo Stoakes March 7, 2025, 1:59 p.m. UTC | #3
On Fri, Mar 07, 2025 at 01:42:13PM +0000, Ryan Roberts wrote:
> On 07/03/2025 13:04, Lorenzo Stoakes wrote:
> > On Fri, Mar 07, 2025 at 12:33:06PM +0000, Ryan Roberts wrote:
> >> Instead of writing a pte directly into the table, use the set_pte_at()
> >> helper, which gives the arch visibility of the change.
> >>
> >> In this instance we are guaranteed that the pte was originally none and
> >> is being modified to a not-present pte, so there was unlikely to be a
> >> bug in practice (at least not on arm64). But it's bad practice to write
> >> the page table memory directly without arch involvement.
> >>
> >> Cc: <stable@vger.kernel.org>
> >> Fixes: 662df3e5c376 ("mm: madvise: implement lightweight guard page mechanism")
> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> >> ---
> >>  mm/madvise.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/mm/madvise.c b/mm/madvise.c
> >> index 388dc289b5d1..6170f4acc14f 100644
> >> --- a/mm/madvise.c
> >> +++ b/mm/madvise.c
> >> @@ -1101,7 +1101,7 @@ static int guard_install_set_pte(unsigned long addr, unsigned long next,
> >>  	unsigned long *nr_pages = (unsigned long *)walk->private;
> >>
> >>  	/* Simply install a PTE marker, this causes segfault on access. */
> >> -	*ptep = make_pte_marker(PTE_MARKER_GUARD);
> >> +	set_pte_at(walk->mm, addr, ptep, make_pte_marker(PTE_MARKER_GUARD));
> >
> > I agree with you, but I think perhaps the arg name here is misleading :) If
> > you look at mm/pagewalk.c and specifically, in walk_pte_range_inner():
> >
> > 		if (ops->install_pte && pte_none(ptep_get(pte))) {
> > 			pte_t new_pte;
> >
> > 			err = ops->install_pte(addr, addr + PAGE_SIZE, &new_pte,
> > 					       walk);
> > 			if (err)
> > 				break;
> >
> > 			set_pte_at(walk->mm, addr, pte, new_pte);
> >
> > 			...
> > 		}
> >
> > So the ptep being assigned here is a stack value, new_pte, which we simply
> > assign to, and _then_ the page walker code handles the set_pte_at() for us.
> >
> > So we are indeed doing the right thing here, just in a different place :P
>
> Ahh my bad. In that case, please ignore the patch.
>
> But out of interest, why are you doing it like this? I find it a bit confusing
> as all the other ops (e.g. pte_entry()) work directly on the pgtable's pte
> without the intermediate.

In those cases it's read-only, the data's already there, you can just go ahead
and manipulate it (and would expect to be able to do so).

When setting things are a little different, I'd rather not open up things to a
user being able to do *whatever*, but rather limit to the smallest scope
possible for installing the PTE.

And also of course, it allows us to _mandate_ that set_pte_at() is used so we do
the right thing re: arches :)

I could have named the parameter better though, in guard_install_pte_entry()
would be better to have called it 'new_pte' or something.

>
> Thanks,
> Ryan
>
> >
> >>  	(*nr_pages)++;
> >>
> >>  	return 0;
> >> --
> >> 2.43.0
> >>
>

Thanks for looking at this by the way, obviously I appreciate your point in
chasing up cases like this as endeavoured to do the right thing here, albeit
abstracted away :)
Ryan Roberts March 7, 2025, 2:35 p.m. UTC | #4
On 07/03/2025 13:59, Lorenzo Stoakes wrote:
> On Fri, Mar 07, 2025 at 01:42:13PM +0000, Ryan Roberts wrote:
>> On 07/03/2025 13:04, Lorenzo Stoakes wrote:
>>> On Fri, Mar 07, 2025 at 12:33:06PM +0000, Ryan Roberts wrote:
>>>> Instead of writing a pte directly into the table, use the set_pte_at()
>>>> helper, which gives the arch visibility of the change.
>>>>
>>>> In this instance we are guaranteed that the pte was originally none and
>>>> is being modified to a not-present pte, so there was unlikely to be a
>>>> bug in practice (at least not on arm64). But it's bad practice to write
>>>> the page table memory directly without arch involvement.
>>>>
>>>> Cc: <stable@vger.kernel.org>
>>>> Fixes: 662df3e5c376 ("mm: madvise: implement lightweight guard page mechanism")
>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>> ---
>>>>  mm/madvise.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/madvise.c b/mm/madvise.c
>>>> index 388dc289b5d1..6170f4acc14f 100644
>>>> --- a/mm/madvise.c
>>>> +++ b/mm/madvise.c
>>>> @@ -1101,7 +1101,7 @@ static int guard_install_set_pte(unsigned long addr, unsigned long next,
>>>>  	unsigned long *nr_pages = (unsigned long *)walk->private;
>>>>
>>>>  	/* Simply install a PTE marker, this causes segfault on access. */
>>>> -	*ptep = make_pte_marker(PTE_MARKER_GUARD);
>>>> +	set_pte_at(walk->mm, addr, ptep, make_pte_marker(PTE_MARKER_GUARD));
>>>
>>> I agree with you, but I think perhaps the arg name here is misleading :) If
>>> you look at mm/pagewalk.c and specifically, in walk_pte_range_inner():
>>>
>>> 		if (ops->install_pte && pte_none(ptep_get(pte))) {
>>> 			pte_t new_pte;
>>>
>>> 			err = ops->install_pte(addr, addr + PAGE_SIZE, &new_pte,
>>> 					       walk);
>>> 			if (err)
>>> 				break;
>>>
>>> 			set_pte_at(walk->mm, addr, pte, new_pte);
>>>
>>> 			...
>>> 		}
>>>
>>> So the ptep being assigned here is a stack value, new_pte, which we simply
>>> assign to, and _then_ the page walker code handles the set_pte_at() for us.
>>>
>>> So we are indeed doing the right thing here, just in a different place :P
>>
>> Ahh my bad. In that case, please ignore the patch.
>>
>> But out of interest, why are you doing it like this? I find it a bit confusing
>> as all the other ops (e.g. pte_entry()) work directly on the pgtable's pte
>> without the intermediate.
> 
> In those cases it's read-only, the data's already there, you can just go ahead
> and manipulate it (and would expect to be able to do so).

It's certainly not read-only in general. Just having a quick look to verify, the
very first callback I landed on was clear_refs_pte_range(), which implements
.pmd_entry to clear the softdirty and access flags from a leaf pmd or from all
the child ptes.

> 
> When setting things are a little different, I'd rather not open up things to a
> user being able to do *whatever*, but rather limit to the smallest scope
> possible for installing the PTE.

Understandable, but personally I think it will lead to potential misunderstandings:

 - it will get copy/pasted as an example of how to set a pte (which is wrong;
you have to use set_pte_at()/set_ptes()). There is currently only a single other
case of direct dereferencing a pte to set it (in write_protect_page()).

 - new users of .install_pte may assume (like I did) that the passed in ptep is
pointing to the pgtable and they will manipulate it with arch helpers. arm64
arch helpers all assume they are only ever passed pointers into pgtable memory.
It will end horribly if that is not the case.

> 
> And also of course, it allows us to _mandate_ that set_pte_at() is used so we do
> the right thing re: arches :)
> 
> I could have named the parameter better though, in guard_install_pte_entry()
> would be better to have called it 'new_pte' or something.

I'd suggest at least describing this in the documentation in pagewalk.h. Or
better yet, you could make the pte the return value for the function. Then it is
clear because you have no pointer. You'd lose the error code but the only user
of this currently can't fail anyway.

Anyway, just my 2 pence.

Thanks,
Ryan

> 
>>
>> Thanks,
>> Ryan
>>
>>>
>>>>  	(*nr_pages)++;
>>>>
>>>>  	return 0;
>>>> --
>>>> 2.43.0
>>>>
>>
> 
> Thanks for looking at this by the way, obviously I appreciate your point in
> chasing up cases like this as endeavoured to do the right thing here, albeit
> abstracted away :)
Lorenzo Stoakes March 7, 2025, 2:55 p.m. UTC | #5
+cc David

On Fri, Mar 07, 2025 at 02:35:12PM +0000, Ryan Roberts wrote:
> On 07/03/2025 13:59, Lorenzo Stoakes wrote:
> > On Fri, Mar 07, 2025 at 01:42:13PM +0000, Ryan Roberts wrote:
> >> On 07/03/2025 13:04, Lorenzo Stoakes wrote:
> >>> On Fri, Mar 07, 2025 at 12:33:06PM +0000, Ryan Roberts wrote:
> >>>> Instead of writing a pte directly into the table, use the set_pte_at()
> >>>> helper, which gives the arch visibility of the change.
> >>>>
> >>>> In this instance we are guaranteed that the pte was originally none and
> >>>> is being modified to a not-present pte, so there was unlikely to be a
> >>>> bug in practice (at least not on arm64). But it's bad practice to write
> >>>> the page table memory directly without arch involvement.
> >>>>
> >>>> Cc: <stable@vger.kernel.org>
> >>>> Fixes: 662df3e5c376 ("mm: madvise: implement lightweight guard page mechanism")
> >>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> >>>> ---
> >>>>  mm/madvise.c | 2 +-
> >>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/mm/madvise.c b/mm/madvise.c
> >>>> index 388dc289b5d1..6170f4acc14f 100644
> >>>> --- a/mm/madvise.c
> >>>> +++ b/mm/madvise.c
> >>>> @@ -1101,7 +1101,7 @@ static int guard_install_set_pte(unsigned long addr, unsigned long next,
> >>>>  	unsigned long *nr_pages = (unsigned long *)walk->private;
> >>>>
> >>>>  	/* Simply install a PTE marker, this causes segfault on access. */
> >>>> -	*ptep = make_pte_marker(PTE_MARKER_GUARD);
> >>>> +	set_pte_at(walk->mm, addr, ptep, make_pte_marker(PTE_MARKER_GUARD));
> >>>
> >>> I agree with you, but I think perhaps the arg name here is misleading :) If
> >>> you look at mm/pagewalk.c and specifically, in walk_pte_range_inner():
> >>>
> >>> 		if (ops->install_pte && pte_none(ptep_get(pte))) {
> >>> 			pte_t new_pte;
> >>>
> >>> 			err = ops->install_pte(addr, addr + PAGE_SIZE, &new_pte,
> >>> 					       walk);
> >>> 			if (err)
> >>> 				break;
> >>>
> >>> 			set_pte_at(walk->mm, addr, pte, new_pte);
> >>>
> >>> 			...
> >>> 		}
> >>>
> >>> So the ptep being assigned here is a stack value, new_pte, which we simply
> >>> assign to, and _then_ the page walker code handles the set_pte_at() for us.
> >>>
> >>> So we are indeed doing the right thing here, just in a different place :P
> >>
> >> Ahh my bad. In that case, please ignore the patch.
> >>
> >> But out of interest, why are you doing it like this? I find it a bit confusing
> >> as all the other ops (e.g. pte_entry()) work directly on the pgtable's pte
> >> without the intermediate.
> >
> > In those cases it's read-only, the data's already there, you can just go ahead
> > and manipulate it (and would expect to be able to do so).
>
> It's certainly not read-only in general. Just having a quick look to verify, the
> very first callback I landed on was clear_refs_pte_range(), which implements
> .pmd_entry to clear the softdirty and access flags from a leaf pmd or from all
> the child ptes.

Yup sorry I misspoke, working some long hours atm so forgive me :) what I meant
to say is that we either read or modify existing.

And yes users do do potentially crazy things and yada yada.

David and I have spoken quite a few times about implementing generic page
table code that could help abstract a lot of things, and it feels like this
logic could all be rejigged in some fashion as to prevent the kind of
'everybody does their own handler' logic.q

I guess I felt it was more _dangerous_ as you are establishing _new_
mappings here, with the page tables being constructed for you up to the PTE
level.

And wanted to 'lock things down' somewhat.

But indeed, all this cries out for a need for a more generalised, robust
interface that handles some of what the downstream users of this are doing.

>
> >
> > When setting things are a little different, I'd rather not open up things to a
> > user being able to do *whatever*, but rather limit to the smallest scope
> > possible for installing the PTE.
>
> Understandable, but personally I think it will lead to potential misunderstandings:
>
>  - it will get copy/pasted as an example of how to set a pte (which is wrong;
> you have to use set_pte_at()/set_ptes()). There is currently only a single other
> case of direct dereferencing a pte to set it (in write_protect_page()).

Yeah, at least renaming the param could help, as 'ptep' implies you really
do have a pointer to the page table entry.

If we didn't return an error we could just return the PTE value or
something... hm.

>
>  - new users of .install_pte may assume (like I did) that the passed in ptep is
> pointing to the pgtable and they will manipulate it with arch helpers. arm64
> arch helpers all assume they are only ever passed pointers into pgtable memory.
> It will end horribly if that is not the case.

It will end very horribly indeed :P or perhaps with more of a fizzle than
anticipated...

>
> >
> > And also of course, it allows us to _mandate_ that set_pte_at() is used so we do
> > the right thing re: arches :)
> >
> > I could have named the parameter better though, in guard_install_pte_entry()
> > would be better to have called it 'new_pte' or something.
>
> I'd suggest at least describing this in the documentation in pagewalk.h. Or
> better yet, you could make the pte the return value for the function. Then it is
> clear because you have no pointer. You'd lose the error code but the only user
> of this currently can't fail anyway.

Haha and here you make the same point I did above... great minds :)

I mean yeah returning a pte would make it clearer what you're doing, but
then it makes it different from every other callback... but this already is
different :)

I do very much want the ability to return an error value to stop the walk
(if you return >0 you can indicate to caller that a non-error stop occurred
for instance, something I use on the reading side).

But we do need to improve this one way or another, at the very least the
documentation/comments.

David - any thoughts?

I'm not necessarily against just making this consitent, but I like this
property of us controlling what happens instead of just giving a pointer
into the page table - the principle of exposing the least possible.

ANWYAY, I will add to my ever expanding whiteboard TODO list [literally the
only todo that work for me] to look at this, will definitely improve docs
at very least.


>
> Anyway, just my 2 pence.

Your input is very much appreciated! Though, with inflation, I think we had
better say 2 pounds... ;)

>
> Thanks,
> Ryan

Cheers!

>
> >
> >>
> >> Thanks,
> >> Ryan
> >>
> >>>
> >>>>  	(*nr_pages)++;
> >>>>
> >>>>  	return 0;
> >>>> --
> >>>> 2.43.0
> >>>>
> >>
> >
> > Thanks for looking at this by the way, obviously I appreciate your point in
> > chasing up cases like this as endeavoured to do the right thing here, albeit
> > abstracted away :)
>
Ryan Roberts March 7, 2025, 3:57 p.m. UTC | #6
On 07/03/2025 14:55, Lorenzo Stoakes wrote:
> I'm not necessarily against just making this consitent, but I like this
> property of us controlling what happens instead of just giving a pointer
> into the page table - the principle of exposing the least possible.

Given the function is called walk_page_range(), I do wonder how much
insulation/abstraction from the page tables is actually required.

But I think in general we are on the same page. Feel free to put looking at this
quite a long way down your todo list, it's certainly not getting in anyone's way
right now. But given it tripped me up, it will probably trip more people up
eventually.

Thanks,
Ryan
David Hildenbrand March 7, 2025, 5:43 p.m. UTC | #7
>> It's certainly not read-only in general. Just having a quick look to verify, the
>> very first callback I landed on was clear_refs_pte_range(), which implements
>> .pmd_entry to clear the softdirty and access flags from a leaf pmd or from all
>> the child ptes.
> 
> Yup sorry I misspoke, working some long hours atm so forgive me :) what I meant
> to say is that we either read or modify existing.
> 
> And yes users do do potentially crazy things and yada yada.
> 
> David and I have spoken quite a few times about implementing generic page
> table code that could help abstract a lot of things, and it feels like this
> logic could all be rejigged in some fashion as to prevent the kind of
> 'everybody does their own handler' logic.q

Oscar is currently prototyping a new pt walker that will batch entries 
(e.g., folio ranges, pte none ares), and not use indirect calls. The 
primary focus will will read-traversal, but nothing speaks against 
modifications (likely helpers for installing pte_none()->marker could be 
handy, and just creating page tables if we hit pmd_none() etc.).

Not sure yet how many use cases we can cover with the initial approach. 
But the idea is to start with something that works for many cases, to 
then gradually improve it.


> 
> I guess I felt it was more _dangerous_ as you are establishing _new_
> mappings here, with the page tables being constructed for you up to the PTE
> level.
> 
> And wanted to 'lock things down' somewhat.
> 
> But indeed, all this cries out for a need for a more generalised, robust
> interface that handles some of what the downstream users of this are doing.
> 
>>
>>>
>>> When setting things are a little different, I'd rather not open up things to a
>>> user being able to do *whatever*, but rather limit to the smallest scope
>>> possible for installing the PTE.
>>
>> Understandable, but personally I think it will lead to potential misunderstandings:
>>
>>   - it will get copy/pasted as an example of how to set a pte (which is wrong;
>> you have to use set_pte_at()/set_ptes()). There is currently only a single other
>> case of direct dereferencing a pte to set it (in write_protect_page()).
> 
> Yeah, at least renaming the param could help, as 'ptep' implies you really
> do have a pointer to the page table entry.
> 
> If we didn't return an error we could just return the PTE value or
> something... hm.
> 
>>
>>   - new users of .install_pte may assume (like I did) that the passed in ptep is
>> pointing to the pgtable and they will manipulate it with arch helpers. arm64
>> arch helpers all assume they are only ever passed pointers into pgtable memory.
>> It will end horribly if that is not the case.
> 
> It will end very horribly indeed :P or perhaps with more of a fizzle than
> anticipated...

Yes, I'm hoping we can avoid new users with the old api ... :)

> 
>>
>>>
>>> And also of course, it allows us to _mandate_ that set_pte_at() is used so we do
>>> the right thing re: arches :)
>>>
>>> I could have named the parameter better though, in guard_install_pte_entry()
>>> would be better to have called it 'new_pte' or something.
>>
>> I'd suggest at least describing this in the documentation in pagewalk.h. Or
>> better yet, you could make the pte the return value for the function. Then it is
>> clear because you have no pointer. You'd lose the error code but the only user
>> of this currently can't fail anyway.
> 
> Haha and here you make the same point I did above... great minds :)
> 
> I mean yeah returning a pte would make it clearer what you're doing, but
> then it makes it different from every other callback... but this already is
> different :)
> 
> I do very much want the ability to return an error value to stop the walk
> (if you return >0 you can indicate to caller that a non-error stop occurred
> for instance, something I use on the reading side).
> 
> But we do need to improve this one way or another, at the very least the
> documentation/comments.
> 
> David - any thoughts?

Maybe document "don't use this until we have something better" :D


> 
> I'm not necessarily against just making this consitent, but I like this
> property of us controlling what happens instead of just giving a pointer
> into the page table - the principle of exposing the least possible.
> 
> ANWYAY, I will add to my ever expanding whiteboard TODO list [literally the
> only todo that work for me] to look at this, will definitely improve docs
> at very least.

Maybe we can talk at LSF/MM about this. I assume Oscar might be 
partially covering that in his hugetlb talk.
Lorenzo Stoakes March 7, 2025, 6:04 p.m. UTC | #8
On Fri, Mar 07, 2025 at 06:43:35PM +0100, David Hildenbrand wrote:
> > > It's certainly not read-only in general. Just having a quick look to verify, the
> > > very first callback I landed on was clear_refs_pte_range(), which implements
> > > .pmd_entry to clear the softdirty and access flags from a leaf pmd or from all
> > > the child ptes.
> >
> > Yup sorry I misspoke, working some long hours atm so forgive me :) what I meant
> > to say is that we either read or modify existing.
> >
> > And yes users do do potentially crazy things and yada yada.
> >
> > David and I have spoken quite a few times about implementing generic page
> > table code that could help abstract a lot of things, and it feels like this
> > logic could all be rejigged in some fashion as to prevent the kind of
> > 'everybody does their own handler' logic.q
>
> Oscar is currently prototyping a new pt walker that will batch entries
> (e.g., folio ranges, pte none ares), and not use indirect calls. The primary
> focus will will read-traversal, but nothing speaks against modifications
> (likely helpers for installing pte_none()->marker could be handy, and just
> creating page tables if we hit pmd_none() etc.).
>
> Not sure yet how many use cases we can cover with the initial approach. But
> the idea is to start with something that works for many cases, to then
> gradually improve it.

Nice! Love it.

>
>
> >
> > I guess I felt it was more _dangerous_ as you are establishing _new_
> > mappings here, with the page tables being constructed for you up to the PTE
> > level.
> >
> > And wanted to 'lock things down' somewhat.
> >
> > But indeed, all this cries out for a need for a more generalised, robust
> > interface that handles some of what the downstream users of this are doing.
> >
> > >
> > > >
> > > > When setting things are a little different, I'd rather not open up things to a
> > > > user being able to do *whatever*, but rather limit to the smallest scope
> > > > possible for installing the PTE.
> > >
> > > Understandable, but personally I think it will lead to potential misunderstandings:
> > >
> > >   - it will get copy/pasted as an example of how to set a pte (which is wrong;
> > > you have to use set_pte_at()/set_ptes()). There is currently only a single other
> > > case of direct dereferencing a pte to set it (in write_protect_page()).
> >
> > Yeah, at least renaming the param could help, as 'ptep' implies you really
> > do have a pointer to the page table entry.
> >
> > If we didn't return an error we could just return the PTE value or
> > something... hm.
> >
> > >
> > >   - new users of .install_pte may assume (like I did) that the passed in ptep is
> > > pointing to the pgtable and they will manipulate it with arch helpers. arm64
> > > arch helpers all assume they are only ever passed pointers into pgtable memory.
> > > It will end horribly if that is not the case.
> >
> > It will end very horribly indeed :P or perhaps with more of a fizzle than
> > anticipated...
>
> Yes, I'm hoping we can avoid new users with the old api ... :)

Well indeed, this one was basically 'what is the least worst smallest churn way
of implementing this thing'. But it's not ideal of course.

>
> >
> > >
> > > >
> > > > And also of course, it allows us to _mandate_ that set_pte_at() is used so we do
> > > > the right thing re: arches :)
> > > >
> > > > I could have named the parameter better though, in guard_install_pte_entry()
> > > > would be better to have called it 'new_pte' or something.
> > >
> > > I'd suggest at least describing this in the documentation in pagewalk.h. Or
> > > better yet, you could make the pte the return value for the function. Then it is
> > > clear because you have no pointer. You'd lose the error code but the only user
> > > of this currently can't fail anyway.
> >
> > Haha and here you make the same point I did above... great minds :)
> >
> > I mean yeah returning a pte would make it clearer what you're doing, but
> > then it makes it different from every other callback... but this already is
> > different :)
> >
> > I do very much want the ability to return an error value to stop the walk
> > (if you return >0 you can indicate to caller that a non-error stop occurred
> > for instance, something I use on the reading side).
> >
> > But we do need to improve this one way or another, at the very least the
> > documentation/comments.
> >
> > David - any thoughts?
>
> Maybe document "don't use this until we have something better" :D

Haha well, sure I can say this :P perhaps something like 'in lieu of a
sophisticated generalised page table walker this is a simple means of
installing PTEs, please note that...' something like this.

A British way of doing it :>)

>
>
> >
> > I'm not necessarily against just making this consitent, but I like this
> > property of us controlling what happens instead of just giving a pointer
> > into the page table - the principle of exposing the least possible.
> >
> > ANWYAY, I will add to my ever expanding whiteboard TODO list [literally the
> > only todo that work for me] to look at this, will definitely improve docs
> > at very least.
>
> Maybe we can talk at LSF/MM about this. I assume Oscar might be partially
> covering that in his hugetlb talk.

Indeed, this is going to be a busy LSF... Ryan will be there also by the
way, so can join us!

>
> --
> Cheers,
>
> David / dhildenb
>
diff mbox series

Patch

diff --git a/mm/madvise.c b/mm/madvise.c
index 388dc289b5d1..6170f4acc14f 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1101,7 +1101,7 @@  static int guard_install_set_pte(unsigned long addr, unsigned long next,
 	unsigned long *nr_pages = (unsigned long *)walk->private;

 	/* Simply install a PTE marker, this causes segfault on access. */
-	*ptep = make_pte_marker(PTE_MARKER_GUARD);
+	set_pte_at(walk->mm, addr, ptep, make_pte_marker(PTE_MARKER_GUARD));
 	(*nr_pages)++;

 	return 0;