mbox series

[v3,0/2] rust: page: Add support for existing struct page mappings

Message ID 20241119112408.779243-1-abdiel.janulgue@gmail.com (mailing list archive)
Headers show
Series rust: page: Add support for existing struct page mappings | expand

Message

Abdiel Janulgue Nov. 19, 2024, 11:24 a.m. UTC
This series aims to add support for pages that are not constructed by an
instance of the rust Page abstraction, for example those returned by
vmalloc_to_page() or virt_to_page().

Changes sinve v3:
- Use the struct page's reference count to decide when to free the
  allocation (Alice Ryhl, Boqun Feng).
- Make Page::page_slice_to_page handle virt_to_page cases as well
  (Danilo Krummrich).
- Link to v2: https://lore.kernel.org/lkml/20241022224832.1505432-1-abdiel.janulgue@gmail.com/

Changes since v2:
- Use Owned and Ownable types for constructing Page as suggested in
  instad of using ptr::read().
- Link to v1: https://lore.kernel.org/rust-for-linux/20241007202752.3096472-1-abdiel.janulgue@gmail.com/

Abdiel Janulgue (2):
  rust: page: use the page's reference count to decide when to free the
    allocation
  rust: page: Extend support to existing struct page mappings

 rust/bindings/bindings_helper.h |   1 +
 rust/helpers/page.c             |  20 +++++
 rust/kernel/page.rs             | 135 ++++++++++++++++++++++++++++----
 3 files changed, 142 insertions(+), 14 deletions(-)


base-commit: b2603f8ac8217bc59f5c7f248ac248423b9b99cb

Comments

Matthew Wilcox Nov. 20, 2024, 4:57 a.m. UTC | #1
On Tue, Nov 19, 2024 at 01:24:01PM +0200, Abdiel Janulgue wrote:
> This series aims to add support for pages that are not constructed by an
> instance of the rust Page abstraction, for example those returned by
> vmalloc_to_page() or virt_to_page().
> 
> Changes sinve v3:
> - Use the struct page's reference count to decide when to free the
>   allocation (Alice Ryhl, Boqun Feng).

Bleh, this is going to be "exciting".  We're in the middle of a multi-year
project to remove refcounts from struct page.  The lifetime of a page
will be controlled by the memdesc that it belongs to.  Some of those
memdescs will have refcounts, but others will not.

We don't have a fully formed destination yet, so I can't give you a
definite answer to a lot of questions.  Obviously I don't want to hold
up the Rust project in any way, but I need to know that what we're trying
to do will be expressible in Rust.

Can we avoid referring to a page's refcount?
Alice Ryhl Nov. 20, 2024, 9:10 a.m. UTC | #2
On Wed, Nov 20, 2024 at 5:57 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Nov 19, 2024 at 01:24:01PM +0200, Abdiel Janulgue wrote:
> > This series aims to add support for pages that are not constructed by an
> > instance of the rust Page abstraction, for example those returned by
> > vmalloc_to_page() or virt_to_page().
> >
> > Changes sinve v3:
> > - Use the struct page's reference count to decide when to free the
> >   allocation (Alice Ryhl, Boqun Feng).
>
> Bleh, this is going to be "exciting".  We're in the middle of a multi-year
> project to remove refcounts from struct page.  The lifetime of a page
> will be controlled by the memdesc that it belongs to.  Some of those
> memdescs will have refcounts, but others will not.
>
> We don't have a fully formed destination yet, so I can't give you a
> definite answer to a lot of questions.  Obviously I don't want to hold
> up the Rust project in any way, but I need to know that what we're trying
> to do will be expressible in Rust.
>
> Can we avoid referring to a page's refcount?

I don't think this patch needs the refcount at all, and the previous
version did not expose it. This came out of the advice to use put_page
over free_page. Does this mean that we should switch to put_page but
not use get_page?

Alice
Boqun Feng Nov. 20, 2024, 4:20 p.m. UTC | #3
On Wed, Nov 20, 2024 at 10:10:44AM +0100, Alice Ryhl wrote:
> On Wed, Nov 20, 2024 at 5:57 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Tue, Nov 19, 2024 at 01:24:01PM +0200, Abdiel Janulgue wrote:
> > > This series aims to add support for pages that are not constructed by an
> > > instance of the rust Page abstraction, for example those returned by
> > > vmalloc_to_page() or virt_to_page().
> > >
> > > Changes sinve v3:
> > > - Use the struct page's reference count to decide when to free the
> > >   allocation (Alice Ryhl, Boqun Feng).
> >
> > Bleh, this is going to be "exciting".  We're in the middle of a multi-year
> > project to remove refcounts from struct page.  The lifetime of a page
> > will be controlled by the memdesc that it belongs to.  Some of those
> > memdescs will have refcounts, but others will not.
> >

One question: will the page that doesn't have refcounts has an exclusive
owner? I.e. there is one owner that's responsible to free the page and
make sure other references to the page get properly invalidated (maybe
via RCU?)

> > We don't have a fully formed destination yet, so I can't give you a
> > definite answer to a lot of questions.  Obviously I don't want to hold
> > up the Rust project in any way, but I need to know that what we're trying
> > to do will be expressible in Rust.
> >
> > Can we avoid referring to a page's refcount?
> 
> I don't think this patch needs the refcount at all, and the previous
> version did not expose it. This came out of the advice to use put_page
> over free_page. Does this mean that we should switch to put_page but
> not use get_page?
> 

I think the point is finding the exact lifetime model for pages, if it's
not a simple refcounting, then what it is? Besides, we can still
represent refcounting pages with `struct Page` and other pages with a
different type name. So as far as I can see, this patch is OK for now.

Regards,
Boqun

> Alice
Matthew Wilcox Nov. 20, 2024, 5:02 p.m. UTC | #4
On Wed, Nov 20, 2024 at 08:20:16AM -0800, Boqun Feng wrote:
> On Wed, Nov 20, 2024 at 10:10:44AM +0100, Alice Ryhl wrote:
> > On Wed, Nov 20, 2024 at 5:57 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Tue, Nov 19, 2024 at 01:24:01PM +0200, Abdiel Janulgue wrote:
> > > > This series aims to add support for pages that are not constructed by an
> > > > instance of the rust Page abstraction, for example those returned by
> > > > vmalloc_to_page() or virt_to_page().
> > > >
> > > > Changes sinve v3:
> > > > - Use the struct page's reference count to decide when to free the
> > > >   allocation (Alice Ryhl, Boqun Feng).
> > >
> > > Bleh, this is going to be "exciting".  We're in the middle of a multi-year
> > > project to remove refcounts from struct page.  The lifetime of a page
> > > will be controlled by the memdesc that it belongs to.  Some of those
> > > memdescs will have refcounts, but others will not.
> > >
> 
> One question: will the page that doesn't have refcounts has an exclusive
> owner? I.e. there is one owner that's responsible to free the page and
> make sure other references to the page get properly invalidated (maybe
> via RCU?)

It's up to the owner of the page how they want to manage freeing it.
They can use a refcount (folios will still have a refcount, for example),
or they can know when there are no more users of the page (eg slab knows
when all objects in a slab are freed).  RCU is a possibility, but would
be quite unusual I would think.  The model I'm looking for here is that
'page' is too low-level an object to have its own lifecycle; it's always
defined by a higher level object.

> > > We don't have a fully formed destination yet, so I can't give you a
> > > definite answer to a lot of questions.  Obviously I don't want to hold
> > > up the Rust project in any way, but I need to know that what we're trying
> > > to do will be expressible in Rust.
> > >
> > > Can we avoid referring to a page's refcount?
> > 
> > I don't think this patch needs the refcount at all, and the previous
> > version did not expose it. This came out of the advice to use put_page
> > over free_page. Does this mean that we should switch to put_page but
> > not use get_page?

Did I advise using put_page() over free_page()?  I hope I didn't say
that.  I don't see a reason why binder needs to refcount its pages (nor
use a mapcount on them), but I don't fully understand binder so maybe
it does need a refcount.

> I think the point is finding the exact lifetime model for pages, if it's
> not a simple refcounting, then what it is? Besides, we can still
> represent refcounting pages with `struct Page` and other pages with a
> different type name. So as far as I can see, this patch is OK for now.

I don't want Page to have a refcount.  If you need something with a
refcount, it needs to be called something else.
Boqun Feng Nov. 20, 2024, 5:25 p.m. UTC | #5
On Wed, Nov 20, 2024 at 05:02:14PM +0000, Matthew Wilcox wrote:
> On Wed, Nov 20, 2024 at 08:20:16AM -0800, Boqun Feng wrote:
> > On Wed, Nov 20, 2024 at 10:10:44AM +0100, Alice Ryhl wrote:
> > > On Wed, Nov 20, 2024 at 5:57 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Tue, Nov 19, 2024 at 01:24:01PM +0200, Abdiel Janulgue wrote:
> > > > > This series aims to add support for pages that are not constructed by an
> > > > > instance of the rust Page abstraction, for example those returned by
> > > > > vmalloc_to_page() or virt_to_page().
> > > > >
> > > > > Changes sinve v3:
> > > > > - Use the struct page's reference count to decide when to free the
> > > > >   allocation (Alice Ryhl, Boqun Feng).
> > > >
> > > > Bleh, this is going to be "exciting".  We're in the middle of a multi-year
> > > > project to remove refcounts from struct page.  The lifetime of a page
> > > > will be controlled by the memdesc that it belongs to.  Some of those
> > > > memdescs will have refcounts, but others will not.
> > > >
> > 
> > One question: will the page that doesn't have refcounts has an exclusive
> > owner? I.e. there is one owner that's responsible to free the page and
> > make sure other references to the page get properly invalidated (maybe
> > via RCU?)
> 
> It's up to the owner of the page how they want to manage freeing it.
> They can use a refcount (folios will still have a refcount, for example),
> or they can know when there are no more users of the page (eg slab knows
> when all objects in a slab are freed).  RCU is a possibility, but would
> be quite unusual I would think.  The model I'm looking for here is that
> 'page' is too low-level an object to have its own lifecycle; it's always
> defined by a higher level object.
> 

Ok, that makes sense. That's actually aligned with the direction we are
heading in this patch: make `struct Page` itself independent on how the
lifetime is maintained. Conceptually, say we can define folio in pure
Rust, it could be:

    struct Folio {
        head: Page, /* or a union of page */
	...
    }

and we can `impl AlwaysRefcounted for Folio`, which implies there is a
refcount inside. And we can also have a `Foo` being:

    struct Foo {
        inner: Page,
    }

which doesn't implement `AlwaysRefcounted`, and that suggests a
different way the page lifetime will be maintained.

> > > > We don't have a fully formed destination yet, so I can't give you a
> > > > definite answer to a lot of questions.  Obviously I don't want to hold
> > > > up the Rust project in any way, but I need to know that what we're trying
> > > > to do will be expressible in Rust.
> > > >
> > > > Can we avoid referring to a page's refcount?
> > > 
> > > I don't think this patch needs the refcount at all, and the previous
> > > version did not expose it. This came out of the advice to use put_page
> > > over free_page. Does this mean that we should switch to put_page but
> > > not use get_page?
> 
> Did I advise using put_page() over free_page()?  I hope I didn't say

We have some off-list discussion about free_page() doesn't always free
the page if you could remember.

> that.  I don't see a reason why binder needs to refcount its pages (nor
> use a mapcount on them), but I don't fully understand binder so maybe
> it does need a refcount.

I don't think binder needs it either, but I think Abdiel here has a
different usage than binder.

> 
> > I think the point is finding the exact lifetime model for pages, if it's
> > not a simple refcounting, then what it is? Besides, we can still
> > represent refcounting pages with `struct Page` and other pages with a
> > different type name. So as far as I can see, this patch is OK for now.
> 
> I don't want Page to have a refcount.  If you need something with a
> refcount, it needs to be called something else.

So if I understand correctly, what Abdiel needs here is a way to convert
a virtual address to the corresponding page, would it make sense to just
use folio in this case? Abdiel, what's the operation you are going to
call on the page you get?

Regards,
Boqun
Abdiel Janulgue Nov. 20, 2024, 10:56 p.m. UTC | #6
On 20/11/2024 19:25, Boqun Feng wrote:
> On Wed, Nov 20, 2024 at 05:02:14PM +0000, Matthew Wilcox wrote:
>> On Wed, Nov 20, 2024 at 08:20:16AM -0800, Boqun Feng wrote:
>>> On Wed, Nov 20, 2024 at 10:10:44AM +0100, Alice Ryhl wrote:
>>>> On Wed, Nov 20, 2024 at 5:57 AM Matthew Wilcox <willy@infradead.org> wrote:
>>>>>
>>>>> On Tue, Nov 19, 2024 at 01:24:01PM +0200, Abdiel Janulgue wrote:
>>>>>> This series aims to add support for pages that are not constructed by an
>>>>>> instance of the rust Page abstraction, for example those returned by
>>>>>> vmalloc_to_page() or virt_to_page().
>>>>>>
>>>>>> Changes sinve v3:
>>>>>> - Use the struct page's reference count to decide when to free the
>>>>>>    allocation (Alice Ryhl, Boqun Feng).
>>>>>
>>>>> Bleh, this is going to be "exciting".  We're in the middle of a multi-year
>>>>> project to remove refcounts from struct page.  The lifetime of a page
>>>>> will be controlled by the memdesc that it belongs to.  Some of those
>>>>> memdescs will have refcounts, but others will not.
>>>>>
>>>
>>> One question: will the page that doesn't have refcounts has an exclusive
>>> owner? I.e. there is one owner that's responsible to free the page and
>>> make sure other references to the page get properly invalidated (maybe
>>> via RCU?)
>>
>> It's up to the owner of the page how they want to manage freeing it.
>> They can use a refcount (folios will still have a refcount, for example),
>> or they can know when there are no more users of the page (eg slab knows
>> when all objects in a slab are freed).  RCU is a possibility, but would
>> be quite unusual I would think.  The model I'm looking for here is that
>> 'page' is too low-level an object to have its own lifecycle; it's always
>> defined by a higher level object.
>>
> 
> Ok, that makes sense. That's actually aligned with the direction we are
> heading in this patch: make `struct Page` itself independent on how the
> lifetime is maintained. Conceptually, say we can define folio in pure
> Rust, it could be:
> 
>      struct Folio {
>          head: Page, /* or a union of page */
> 	...
>      }
> 
> and we can `impl AlwaysRefcounted for Folio`, which implies there is a
> refcount inside. And we can also have a `Foo` being:
> 
>      struct Foo {
>          inner: Page,
>      }
> 
> which doesn't implement `AlwaysRefcounted`, and that suggests a
> different way the page lifetime will be maintained.
> 
>>>>> We don't have a fully formed destination yet, so I can't give you a
>>>>> definite answer to a lot of questions.  Obviously I don't want to hold
>>>>> up the Rust project in any way, but I need to know that what we're trying
>>>>> to do will be expressible in Rust.
>>>>>
>>>>> Can we avoid referring to a page's refcount?
>>>>
>>>> I don't think this patch needs the refcount at all, and the previous
>>>> version did not expose it. This came out of the advice to use put_page
>>>> over free_page. Does this mean that we should switch to put_page but
>>>> not use get_page?
>>
>> Did I advise using put_page() over free_page()?  I hope I didn't say
> 
> We have some off-list discussion about free_page() doesn't always free
> the page if you could remember.
> 
>> that.  I don't see a reason why binder needs to refcount its pages (nor
>> use a mapcount on them), but I don't fully understand binder so maybe
>> it does need a refcount.
> 
> I don't think binder needs it either, but I think Abdiel here has a
> different usage than binder.
> 
>>
>>> I think the point is finding the exact lifetime model for pages, if it's
>>> not a simple refcounting, then what it is? Besides, we can still
>>> represent refcounting pages with `struct Page` and other pages with a
>>> different type name. So as far as I can see, this patch is OK for now.
>>
>> I don't want Page to have a refcount.  If you need something with a
>> refcount, it needs to be called something else.
> 
> So if I understand correctly, what Abdiel needs here is a way to convert
> a virtual address to the corresponding page, would it make sense to just
> use folio in this case? Abdiel, what's the operation you are going to
> call on the page you get?

Yes that's basically it. The goal here is represent those existing 
struct page within this rust Page abstraction but at the same time to 
avoid taking over its ownership.

Boqun, Alice, should we reconsider Ownable and Owned trait again? :)

Regards,
Abdiel
Boqun Feng Nov. 21, 2024, 12:24 a.m. UTC | #7
On Thu, Nov 21, 2024 at 12:56:38AM +0200, Abdiel Janulgue wrote:
> On 20/11/2024 19:25, Boqun Feng wrote:
> > On Wed, Nov 20, 2024 at 05:02:14PM +0000, Matthew Wilcox wrote:
> > > On Wed, Nov 20, 2024 at 08:20:16AM -0800, Boqun Feng wrote:
> > > > On Wed, Nov 20, 2024 at 10:10:44AM +0100, Alice Ryhl wrote:
> > > > > On Wed, Nov 20, 2024 at 5:57 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > 
> > > > > > On Tue, Nov 19, 2024 at 01:24:01PM +0200, Abdiel Janulgue wrote:
> > > > > > > This series aims to add support for pages that are not constructed by an
> > > > > > > instance of the rust Page abstraction, for example those returned by
> > > > > > > vmalloc_to_page() or virt_to_page().
> > > > > > > 
> > > > > > > Changes sinve v3:
> > > > > > > - Use the struct page's reference count to decide when to free the
> > > > > > >    allocation (Alice Ryhl, Boqun Feng).
> > > > > > 
> > > > > > Bleh, this is going to be "exciting".  We're in the middle of a multi-year
> > > > > > project to remove refcounts from struct page.  The lifetime of a page
> > > > > > will be controlled by the memdesc that it belongs to.  Some of those
> > > > > > memdescs will have refcounts, but others will not.
> > > > > > 
> > > > 
> > > > One question: will the page that doesn't have refcounts has an exclusive
> > > > owner? I.e. there is one owner that's responsible to free the page and
> > > > make sure other references to the page get properly invalidated (maybe
> > > > via RCU?)
> > > 
> > > It's up to the owner of the page how they want to manage freeing it.
> > > They can use a refcount (folios will still have a refcount, for example),
> > > or they can know when there are no more users of the page (eg slab knows
> > > when all objects in a slab are freed).  RCU is a possibility, but would
> > > be quite unusual I would think.  The model I'm looking for here is that
> > > 'page' is too low-level an object to have its own lifecycle; it's always
> > > defined by a higher level object.
> > > 
> > 
> > Ok, that makes sense. That's actually aligned with the direction we are
> > heading in this patch: make `struct Page` itself independent on how the
> > lifetime is maintained. Conceptually, say we can define folio in pure
> > Rust, it could be:
> > 
> >      struct Folio {
> >          head: Page, /* or a union of page */
> > 	...
> >      }
> > 
> > and we can `impl AlwaysRefcounted for Folio`, which implies there is a
> > refcount inside. And we can also have a `Foo` being:
> > 
> >      struct Foo {
> >          inner: Page,
> >      }
> > 
> > which doesn't implement `AlwaysRefcounted`, and that suggests a
> > different way the page lifetime will be maintained.
> > 
> > > > > > We don't have a fully formed destination yet, so I can't give you a
> > > > > > definite answer to a lot of questions.  Obviously I don't want to hold
> > > > > > up the Rust project in any way, but I need to know that what we're trying
> > > > > > to do will be expressible in Rust.
> > > > > > 
> > > > > > Can we avoid referring to a page's refcount?
> > > > > 
> > > > > I don't think this patch needs the refcount at all, and the previous
> > > > > version did not expose it. This came out of the advice to use put_page
> > > > > over free_page. Does this mean that we should switch to put_page but
> > > > > not use get_page?
> > > 
> > > Did I advise using put_page() over free_page()?  I hope I didn't say
> > 
> > We have some off-list discussion about free_page() doesn't always free
> > the page if you could remember.
> > 
> > > that.  I don't see a reason why binder needs to refcount its pages (nor
> > > use a mapcount on them), but I don't fully understand binder so maybe
> > > it does need a refcount.
> > 
> > I don't think binder needs it either, but I think Abdiel here has a
> > different usage than binder.
> > 
> > > 
> > > > I think the point is finding the exact lifetime model for pages, if it's
> > > > not a simple refcounting, then what it is? Besides, we can still
> > > > represent refcounting pages with `struct Page` and other pages with a
> > > > different type name. So as far as I can see, this patch is OK for now.
> > > 
> > > I don't want Page to have a refcount.  If you need something with a
> > > refcount, it needs to be called something else.
> > 
> > So if I understand correctly, what Abdiel needs here is a way to convert
> > a virtual address to the corresponding page, would it make sense to just
> > use folio in this case? Abdiel, what's the operation you are going to
> > call on the page you get?
> 
> Yes that's basically it. The goal here is represent those existing struct
> page within this rust Page abstraction but at the same time to avoid taking
> over its ownership.
> 
> Boqun, Alice, should we reconsider Ownable and Owned trait again? :)
> 

Could you use folio in your case? If so, we can provide a simple binding
for folio which should be `AlwaysRefcounted`, and re-investigate how
page should be wrapped.

Regards,
Boqun

> Regards,
> Abdiel
Alice Ryhl Nov. 21, 2024, 9:19 a.m. UTC | #8
On Thu, Nov 21, 2024 at 1:24 AM Boqun Feng <boqun.feng@gmail.com> wrote:
>
> On Thu, Nov 21, 2024 at 12:56:38AM +0200, Abdiel Janulgue wrote:
> > On 20/11/2024 19:25, Boqun Feng wrote:
> > > On Wed, Nov 20, 2024 at 05:02:14PM +0000, Matthew Wilcox wrote:
> > > > On Wed, Nov 20, 2024 at 08:20:16AM -0800, Boqun Feng wrote:
> > > > > On Wed, Nov 20, 2024 at 10:10:44AM +0100, Alice Ryhl wrote:
> > > > > > On Wed, Nov 20, 2024 at 5:57 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > >
> > > > > > > On Tue, Nov 19, 2024 at 01:24:01PM +0200, Abdiel Janulgue wrote:
> > > > > > > > This series aims to add support for pages that are not constructed by an
> > > > > > > > instance of the rust Page abstraction, for example those returned by
> > > > > > > > vmalloc_to_page() or virt_to_page().
> > > > > > > >
> > > > > > > > Changes sinve v3:
> > > > > > > > - Use the struct page's reference count to decide when to free the
> > > > > > > >    allocation (Alice Ryhl, Boqun Feng).
> > > > > > >
> > > > > > > Bleh, this is going to be "exciting".  We're in the middle of a multi-year
> > > > > > > project to remove refcounts from struct page.  The lifetime of a page
> > > > > > > will be controlled by the memdesc that it belongs to.  Some of those
> > > > > > > memdescs will have refcounts, but others will not.
> > > > > > >
> > > > >
> > > > > One question: will the page that doesn't have refcounts has an exclusive
> > > > > owner? I.e. there is one owner that's responsible to free the page and
> > > > > make sure other references to the page get properly invalidated (maybe
> > > > > via RCU?)
> > > >
> > > > It's up to the owner of the page how they want to manage freeing it.
> > > > They can use a refcount (folios will still have a refcount, for example),
> > > > or they can know when there are no more users of the page (eg slab knows
> > > > when all objects in a slab are freed).  RCU is a possibility, but would
> > > > be quite unusual I would think.  The model I'm looking for here is that
> > > > 'page' is too low-level an object to have its own lifecycle; it's always
> > > > defined by a higher level object.
> > > >
> > >
> > > Ok, that makes sense. That's actually aligned with the direction we are
> > > heading in this patch: make `struct Page` itself independent on how the
> > > lifetime is maintained. Conceptually, say we can define folio in pure
> > > Rust, it could be:
> > >
> > >      struct Folio {
> > >          head: Page, /* or a union of page */
> > >     ...
> > >      }
> > >
> > > and we can `impl AlwaysRefcounted for Folio`, which implies there is a
> > > refcount inside. And we can also have a `Foo` being:
> > >
> > >      struct Foo {
> > >          inner: Page,
> > >      }
> > >
> > > which doesn't implement `AlwaysRefcounted`, and that suggests a
> > > different way the page lifetime will be maintained.
> > >
> > > > > > > We don't have a fully formed destination yet, so I can't give you a
> > > > > > > definite answer to a lot of questions.  Obviously I don't want to hold
> > > > > > > up the Rust project in any way, but I need to know that what we're trying
> > > > > > > to do will be expressible in Rust.
> > > > > > >
> > > > > > > Can we avoid referring to a page's refcount?
> > > > > >
> > > > > > I don't think this patch needs the refcount at all, and the previous
> > > > > > version did not expose it. This came out of the advice to use put_page
> > > > > > over free_page. Does this mean that we should switch to put_page but
> > > > > > not use get_page?
> > > >
> > > > Did I advise using put_page() over free_page()?  I hope I didn't say
> > >
> > > We have some off-list discussion about free_page() doesn't always free
> > > the page if you could remember.
> > >
> > > > that.  I don't see a reason why binder needs to refcount its pages (nor
> > > > use a mapcount on them), but I don't fully understand binder so maybe
> > > > it does need a refcount.
> > >
> > > I don't think binder needs it either, but I think Abdiel here has a
> > > different usage than binder.
> > >
> > > >
> > > > > I think the point is finding the exact lifetime model for pages, if it's
> > > > > not a simple refcounting, then what it is? Besides, we can still
> > > > > represent refcounting pages with `struct Page` and other pages with a
> > > > > different type name. So as far as I can see, this patch is OK for now.
> > > >
> > > > I don't want Page to have a refcount.  If you need something with a
> > > > refcount, it needs to be called something else.
> > >
> > > So if I understand correctly, what Abdiel needs here is a way to convert
> > > a virtual address to the corresponding page, would it make sense to just
> > > use folio in this case? Abdiel, what's the operation you are going to
> > > call on the page you get?
> >
> > Yes that's basically it. The goal here is represent those existing struct
> > page within this rust Page abstraction but at the same time to avoid taking
> > over its ownership.
> >
> > Boqun, Alice, should we reconsider Ownable and Owned trait again? :)
> >
>
> Could you use folio in your case? If so, we can provide a simple binding
> for folio which should be `AlwaysRefcounted`, and re-investigate how
> page should be wrapped.

Well, regardless of that, I do think it sounds like Owned / Ownable is
the right way forward for Page.

Alice
Abdiel Janulgue Nov. 21, 2024, 9:30 a.m. UTC | #9
Hi Boqun, Matthew:

On 21/11/2024 02:24, Boqun Feng wrote:
>>> So if I understand correctly, what Abdiel needs here is a way to convert
>>> a virtual address to the corresponding page, would it make sense to just
>>> use folio in this case? Abdiel, what's the operation you are going to
>>> call on the page you get?
>>
>> Yes that's basically it. The goal here is represent those existing struct
>> page within this rust Page abstraction but at the same time to avoid taking
>> over its ownership.
>>
>> Boqun, Alice, should we reconsider Ownable and Owned trait again? :)
>>
> 
> Could you use folio in your case? If so, we can provide a simple binding
> for folio which should be `AlwaysRefcounted`, and re-investigate how
> page should be wrapped.
> 

I'm not sure. Is there a way to get the struct folio from a vmalloc'd 
address, e.g vmalloc_to_folio()?

Regards,
Abdiel
Boqun Feng Nov. 21, 2024, 7:10 p.m. UTC | #10
[Cc Kairui in case he's interested]

On Thu, Nov 21, 2024 at 11:30:13AM +0200, Abdiel Janulgue wrote:
> Hi Boqun, Matthew:
> 
> On 21/11/2024 02:24, Boqun Feng wrote:
> > > > So if I understand correctly, what Abdiel needs here is a way to convert
> > > > a virtual address to the corresponding page, would it make sense to just
> > > > use folio in this case? Abdiel, what's the operation you are going to
> > > > call on the page you get?
> > > 
> > > Yes that's basically it. The goal here is represent those existing struct
> > > page within this rust Page abstraction but at the same time to avoid taking
> > > over its ownership.
> > > 
> > > Boqun, Alice, should we reconsider Ownable and Owned trait again? :)
> > > 
> > 
> > Could you use folio in your case? If so, we can provide a simple binding
> > for folio which should be `AlwaysRefcounted`, and re-investigate how
> > page should be wrapped.
> > 
> 
> I'm not sure. Is there a way to get the struct folio from a vmalloc'd
> address, e.g vmalloc_to_folio()?
> 

I think you can use page_folio(vmalloc_to_page(..)) to get the folio,
but one thing to notice is that folio is guaranteed to be a non-tail
page, so if you want to do something later for the particular page (if
it's a tail page), you will need to know the offset of the that page in
folio. You can do something like below:

    pub fn page_slice_to_folio<'a>(page: &PageSlice) -> Result<(&'a Folio, usize)> {
        ...
	let page = vmalloc_to_page(ptr);

	let folio = page_folio(page);
	let offset = folio_page_idx(folio, page);

	Ok((folio, offset))
    }	

And you have a folio -> page function like:

    pub struct Folio(Opaque<bindings::folio>);

    impl Folio {
        pub fn nth_page(&self, n: usize) -> &Page {
	    &*(nth_page(self.0.get(), n))
	}
    }

Of course, this is me acting as I know MM ;-) but I feel this is the way
to go. And if binder can use folio as well (I don't see a reason why
not, but it's extra work, so defer to Alice), then we would only need
the `pub struct Page { inner: Opaque<bindings::page> }` part in your
patch #1, and can avoid doing `Ownable` or `AlwaysRefcounted` for
`Page`.

Thoughts?

Regards,
Boqun

> Regards,
> Abdiel
Boqun Feng Nov. 21, 2024, 7:12 p.m. UTC | #11
On Thu, Nov 21, 2024 at 11:10:45AM -0800, Boqun Feng wrote:
> [Cc Kairui in case he's interested]
> 

(forgot to cc...)

> On Thu, Nov 21, 2024 at 11:30:13AM +0200, Abdiel Janulgue wrote:
> > Hi Boqun, Matthew:
> > 
> > On 21/11/2024 02:24, Boqun Feng wrote:
> > > > > So if I understand correctly, what Abdiel needs here is a way to convert
> > > > > a virtual address to the corresponding page, would it make sense to just
> > > > > use folio in this case? Abdiel, what's the operation you are going to
> > > > > call on the page you get?
> > > > 
> > > > Yes that's basically it. The goal here is represent those existing struct
> > > > page within this rust Page abstraction but at the same time to avoid taking
> > > > over its ownership.
> > > > 
> > > > Boqun, Alice, should we reconsider Ownable and Owned trait again? :)
> > > > 
> > > 
> > > Could you use folio in your case? If so, we can provide a simple binding
> > > for folio which should be `AlwaysRefcounted`, and re-investigate how
> > > page should be wrapped.
> > > 
> > 
> > I'm not sure. Is there a way to get the struct folio from a vmalloc'd
> > address, e.g vmalloc_to_folio()?
> > 
> 
> I think you can use page_folio(vmalloc_to_page(..)) to get the folio,
> but one thing to notice is that folio is guaranteed to be a non-tail
> page, so if you want to do something later for the particular page (if
> it's a tail page), you will need to know the offset of the that page in
> folio. You can do something like below:
> 
>     pub fn page_slice_to_folio<'a>(page: &PageSlice) -> Result<(&'a Folio, usize)> {
>         ...
> 	let page = vmalloc_to_page(ptr);
> 
> 	let folio = page_folio(page);
> 	let offset = folio_page_idx(folio, page);
> 
> 	Ok((folio, offset))
>     }	
> 
> And you have a folio -> page function like:
> 
>     pub struct Folio(Opaque<bindings::folio>);
> 
>     impl Folio {
>         pub fn nth_page(&self, n: usize) -> &Page {
> 	    &*(nth_page(self.0.get(), n))
> 	}
>     }
> 
> Of course, this is me acting as I know MM ;-) but I feel this is the way
> to go. And if binder can use folio as well (I don't see a reason why
> not, but it's extra work, so defer to Alice), then we would only need
> the `pub struct Page { inner: Opaque<bindings::page> }` part in your
> patch #1, and can avoid doing `Ownable` or `AlwaysRefcounted` for
> `Page`.
> 
> Thoughts?
> 
> Regards,
> Boqun
> 
> > Regards,
> > Abdiel
Matthew Wilcox Nov. 21, 2024, 10:01 p.m. UTC | #12
On Thu, Nov 21, 2024 at 11:12:30AM -0800, Boqun Feng wrote:
> On Thu, Nov 21, 2024 at 11:30:13AM +0200, Abdiel Janulgue wrote:
> > Hi Boqun, Matthew:
> > 
> > On 21/11/2024 02:24, Boqun Feng wrote:
> > > > > So if I understand correctly, what Abdiel needs here is a way to convert
> > > > > a virtual address to the corresponding page, would it make sense to just
> > > > > use folio in this case? Abdiel, what's the operation you are going to
> > > > > call on the page you get?
> > > > 
> > > > Yes that's basically it. The goal here is represent those existing struct
> > > > page within this rust Page abstraction but at the same time to avoid taking
> > > > over its ownership.
> > > > 
> > > > Boqun, Alice, should we reconsider Ownable and Owned trait again? :)
> > > > 
> > > 
> > > Could you use folio in your case? If so, we can provide a simple binding
> > > for folio which should be `AlwaysRefcounted`, and re-investigate how
> > > page should be wrapped.
> > > 
> > 
> > I'm not sure. Is there a way to get the struct folio from a vmalloc'd
> > address, e.g vmalloc_to_folio()?
> > 
> 
> I think you can use page_folio(vmalloc_to_page(..)) to get the folio,
> but one thing to notice is that folio is guaranteed to be a non-tail
> page, so if you want to do something later for the particular page (if
> it's a tail page), you will need to know the offset of the that page in
> folio. You can do something like below:

This is one of those things which will work today, but will stop
working in the future, and anyway will only appear to work for some
users.

For example, both vmalloc and slab allocations do not use the refcount
on the struct page for anything.  eg this will be a UAF (please excuse
me writing in C):

	char *a = kmalloc(256, GFP_KERNEL);
	struct page *page = get_page(virt_to_page(a));
	char *b = page_address(page) + offset_in_page(a);
	// a and b will now have the same bit pattern
	kfree(a);
	*b = 1;

Once you've called kfree(), slab is entitled to hand that memory out
to any other user of kmalloc().  This might actually work to protect
vmalloc() memory from going away under you, but I intend to change
vmalloc so that it won't work (nothing to do with this patch series,
rather an approach to make vmalloc more efficient).

One reason you're confused today is that we have a temporary ambiguity
around what "folio" actually means.  The original definition (ie mine) was
simply that it was a non-tail page.  We're moving towards the definition
Johannes wanted, which is that it's only the memdesc for anonymous &
file-backed memory [1].  So while vmalloc_to_folio() makes sense under
the original definition, it's an absurdity under the new definition.

So, Abdiel, why are you trying to add this?  What are you actually
trying to accomplish in terms of "I am writing a device driver for XXX
and I need to ..."?  You've been very evasive up to now.

[1] Actually Johannes wants to split them apart even further so that
anon & file memory have different types, and we may yet get there.
One step at a time.
Abdiel Janulgue Nov. 21, 2024, 11:18 p.m. UTC | #13
On 22/11/2024 00:01, Matthew Wilcox wrote:
> On Thu, Nov 21, 2024 at 11:12:30AM -0800, Boqun Feng wrote:
>> On Thu, Nov 21, 2024 at 11:30:13AM +0200, Abdiel Janulgue wrote:
>>> Hi Boqun, Matthew:
>>>
>>> On 21/11/2024 02:24, Boqun Feng wrote:
>>>>>> So if I understand correctly, what Abdiel needs here is a way to convert
>>>>>> a virtual address to the corresponding page, would it make sense to just
>>>>>> use folio in this case? Abdiel, what's the operation you are going to
>>>>>> call on the page you get?
>>>>>
>>>>> Yes that's basically it. The goal here is represent those existing struct
>>>>> page within this rust Page abstraction but at the same time to avoid taking
>>>>> over its ownership.
>>>>>
>>>>> Boqun, Alice, should we reconsider Ownable and Owned trait again? :)
>>>>>
>>>>
>>>> Could you use folio in your case? If so, we can provide a simple binding
>>>> for folio which should be `AlwaysRefcounted`, and re-investigate how
>>>> page should be wrapped.
>>>>
>>>
>>> I'm not sure. Is there a way to get the struct folio from a vmalloc'd
>>> address, e.g vmalloc_to_folio()?
>>>
>>
>> I think you can use page_folio(vmalloc_to_page(..)) to get the folio,
>> but one thing to notice is that folio is guaranteed to be a non-tail
>> page, so if you want to do something later for the particular page (if
>> it's a tail page), you will need to know the offset of the that page in
>> folio. You can do something like below:
> 
> This is one of those things which will work today, but will stop
> working in the future, and anyway will only appear to work for some
> users.
> 
> For example, both vmalloc and slab allocations do not use the refcount
> on the struct page for anything.  eg this will be a UAF (please excuse
> me writing in C):
> 
> 	char *a = kmalloc(256, GFP_KERNEL);
> 	struct page *page = get_page(virt_to_page(a));
> 	char *b = page_address(page) + offset_in_page(a);
> 	// a and b will now have the same bit pattern
> 	kfree(a);
> 	*b = 1;
> 
> Once you've called kfree(), slab is entitled to hand that memory out
> to any other user of kmalloc().  This might actually work to protect
> vmalloc() memory from going away under you, but I intend to change
> vmalloc so that it won't work (nothing to do with this patch series,
> rather an approach to make vmalloc more efficient).
> 
> One reason you're confused today is that we have a temporary ambiguity
> around what "folio" actually means.  The original definition (ie mine) was
> simply that it was a non-tail page.  We're moving towards the definition
> Johannes wanted, which is that it's only the memdesc for anonymous &
> file-backed memory [1].  So while vmalloc_to_folio() makes sense under
> the original definition, it's an absurdity under the new definition.
> 
> So, Abdiel, why are you trying to add this?  What are you actually
> trying to accomplish in terms of "I am writing a device driver for XXX
> and I need to ..."?  You've been very evasive up to now.

Background behind this is that we need this for the nova rust driver [0].

We need an abstraction of struct page to construct a scatterlist which 
is needed for an internal firmware structure. Now most of pages needed 
there come from vmalloc_to_page() which, unlike the current rust Page 
abstraction, not allocated on demand but is an existing mapping.

Hope that clears things up!

Regards,
Abdiel

[0] https://rust-for-linux.com/nova-gpu-driver
Matthew Wilcox Nov. 22, 2024, 1:24 a.m. UTC | #14
On Fri, Nov 22, 2024 at 01:18:28AM +0200, Abdiel Janulgue wrote:
> We need an abstraction of struct page to construct a scatterlist which is
> needed for an internal firmware structure. Now most of pages needed there
> come from vmalloc_to_page() which, unlike the current rust Page abstraction,
> not allocated on demand but is an existing mapping.
> 
> Hope that clears things up!

That's very helpful!  So the lifetime of the scatterllist must not
outlive the lifetime of the vmalloc allocation.  That means you can call
kmap_local_page() on the page in the scatterlist without worrying about
the refcount of the struct page.  BTW, you can't call page_address() on
vmalloc memory because vmalloc can allocate pages from HIGHMEM.  Unless
you're willing to disable support for 32-bit systems with highmem ...
David Airlie Nov. 22, 2024, 6:58 a.m. UTC | #15
On Fri, Nov 22, 2024 at 11:24 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Nov 22, 2024 at 01:18:28AM +0200, Abdiel Janulgue wrote:
> > We need an abstraction of struct page to construct a scatterlist which is
> > needed for an internal firmware structure. Now most of pages needed there
> > come from vmalloc_to_page() which, unlike the current rust Page abstraction,
> > not allocated on demand but is an existing mapping.
> >
> > Hope that clears things up!
>
> That's very helpful!  So the lifetime of the scatterllist must not
> outlive the lifetime of the vmalloc allocation.  That means you can call
> kmap_local_page() on the page in the scatterlist without worrying about
> the refcount of the struct page.  BTW, you can't call page_address() on
> vmalloc memory because vmalloc can allocate pages from HIGHMEM.  Unless
> you're willing to disable support for 32-bit systems with highmem ...
>

https://elixir.bootlin.com/linux/v6.11.5/source/drivers/gpu/drm/nouveau/nvkm/core/firmware.c#L266

This is the C code we want to rustify.

Dave.
Paolo Bonzini Nov. 22, 2024, 12:37 p.m. UTC | #16
On 11/22/24 07:58, David Airlie wrote:
> On Fri, Nov 22, 2024 at 11:24 AM Matthew Wilcox <willy@infradead.org> wrote:
>>
>> On Fri, Nov 22, 2024 at 01:18:28AM +0200, Abdiel Janulgue wrote:
>>> We need an abstraction of struct page to construct a scatterlist which is
>>> needed for an internal firmware structure. Now most of pages needed there
>>> come from vmalloc_to_page() which, unlike the current rust Page abstraction,
>>> not allocated on demand but is an existing mapping.
>>>
>>> Hope that clears things up!
>>
>> That's very helpful!  So the lifetime of the scatterllist must not
>> outlive the lifetime of the vmalloc allocation.  That means you can call
>> kmap_local_page() on the page in the scatterlist without worrying about
>> the refcount of the struct page.  BTW, you can't call page_address() on
>> vmalloc memory because vmalloc can allocate pages from HIGHMEM.  Unless
>> you're willing to disable support for 32-bit systems with highmem ...
>>
> 
> https://elixir.bootlin.com/linux/v6.11.5/source/drivers/gpu/drm/nouveau/nvkm/core/firmware.c#L266
> 
> This is the C code we want to rustify.

I don't think you want to increase/decrease the refcount there.  Instead 
you tie the lifetime of the returned page to the lifetime of the thing 
that provides the page, which would be some kind of NvkmFirmware struct.

pub enum NvkmFirmwareData {
     Ram(KBox<[PageSlice]>,
     Dma(CoherentAllocation<PageSlice>,
     Sgt(VBox<[PageSlice]>,
}

pub struct NvkmFirmware {
     ...,
     img: NvkmFirmwareData,
}

pub struct NvkmFirmwarePages<'a> {
     fw: &'a NvkmFirmware,
     sgt: SgTable,
}

impl NvkmFirmware {
     fn get_sgl(&self) -> NvkmFirmwarePages { ... }
}


Perhaps a trait that is implemented by both {K,V,KV}Vec<PageSlice> and 
{K,V,KV}Box<[PageSlice]>, like

trait ToComponentPage {
     fn to_component_page(&self, i: usize) -> &Page;
}

impl ToComponentPage for KVec<PageSlice> { // same for KBox<[PageSlice]>
     fn to_component_page(&self, i: usize) -> &Page {
         let base = &self[i << PAGE_SHIFT..];
         bindings::virt_to_page(base.as_ptr())
     }
}

impl ToComponentPage for VVec<PageSlice> { // same for VBox<[PageSlice]>
     fn to_component_page(&self, i: usize) -> &Page {
         let base = &self[i << PAGE_SHIFT..];
         bindings::vmalloc_to_page(base.as_ptr())
     }
}

?  And possibly also

trait ToComponentPageMut {
     fn to_component_page_mut(&mut self, i: usize) -> &Page;
}

which would be implemented by the Box types, but not by the Vec types 
because their data is not pinned.

Paolo
Jann Horn Nov. 26, 2024, 8:31 p.m. UTC | #17
On Wed, Nov 20, 2024 at 6:02 PM Matthew Wilcox <willy@infradead.org> wrote:
> On Wed, Nov 20, 2024 at 08:20:16AM -0800, Boqun Feng wrote:
> > On Wed, Nov 20, 2024 at 10:10:44AM +0100, Alice Ryhl wrote:
> > > On Wed, Nov 20, 2024 at 5:57 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > > We don't have a fully formed destination yet, so I can't give you a
> > > > definite answer to a lot of questions.  Obviously I don't want to hold
> > > > up the Rust project in any way, but I need to know that what we're trying
> > > > to do will be expressible in Rust.
> > > >
> > > > Can we avoid referring to a page's refcount?
> > >
> > > I don't think this patch needs the refcount at all, and the previous
> > > version did not expose it. This came out of the advice to use put_page
> > > over free_page. Does this mean that we should switch to put_page but
> > > not use get_page?
>
> Did I advise using put_page() over free_page()?  I hope I didn't say
> that.  I don't see a reason why binder needs to refcount its pages (nor
> use a mapcount on them), but I don't fully understand binder so maybe
> it does need a refcount.

I think that was me, at
<https://lore.kernel.org/all/CAG48ez32zWt4mcfA+y2FnzzNmFe-0ns9XQgp=QYeFpRsdiCAnw@mail.gmail.com/>.
Looking at the C binder version, binder_install_single_page() installs
pages into userspace page tables in a VM_MIXEDMAP mapping using
vm_insert_page(), and when you do that with pages from the page
allocator, userspace can grab references to them through GUP-fast (and
I think also through GUP). (See how vm_insert_page() and
vm_get_page_prot() don't use pte_mkspecial(), which is pretty much the
only thing that can stop GUP-fast on most architectures.)

My understanding is that the combination VM_IO|VM_MIXEDMAP would stop
normal GUP, but currently the only way to block GUP-fast is to use
VM_PFNMAP. (Which, as far as I understand, is also why GPU drivers use
VM_PFNMAP so much.) Maybe we should change that, so that VM_IO and/or
VM_MIXEDMAP blocks GUP in the region and causes installed PTEs to be
marked with pte_mkspecial()?

I am not entirely sure about this stuff, but I was recently looking at
net/packet/af_packet.c, and I tested that vmsplice() can grab
references to the high-order compound pages that
alloc_one_pg_vec_page() allocates with __get_free_pages(GFP_KERNEL |
__GFP_COMP | __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY, order),
packet_mmap() inserts with vm_insert_page(), and free_pg_vec() drops
with free_pages(). (But that all happens to actually work fine,
free_pages() actually handles refcounted compound pages properly.)
Jann Horn Nov. 26, 2024, 8:43 p.m. UTC | #18
On Tue, Nov 26, 2024 at 9:31 PM Jann Horn <jannh@google.com> wrote:
> On Wed, Nov 20, 2024 at 6:02 PM Matthew Wilcox <willy@infradead.org> wrote:
> > On Wed, Nov 20, 2024 at 08:20:16AM -0800, Boqun Feng wrote:
> > > On Wed, Nov 20, 2024 at 10:10:44AM +0100, Alice Ryhl wrote:
> > > > On Wed, Nov 20, 2024 at 5:57 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > We don't have a fully formed destination yet, so I can't give you a
> > > > > definite answer to a lot of questions.  Obviously I don't want to hold
> > > > > up the Rust project in any way, but I need to know that what we're trying
> > > > > to do will be expressible in Rust.
> > > > >
> > > > > Can we avoid referring to a page's refcount?
> > > >
> > > > I don't think this patch needs the refcount at all, and the previous
> > > > version did not expose it. This came out of the advice to use put_page
> > > > over free_page. Does this mean that we should switch to put_page but
> > > > not use get_page?
> >
> > Did I advise using put_page() over free_page()?  I hope I didn't say
> > that.  I don't see a reason why binder needs to refcount its pages (nor
> > use a mapcount on them), but I don't fully understand binder so maybe
> > it does need a refcount.
>
> I think that was me, at
> <https://lore.kernel.org/all/CAG48ez32zWt4mcfA+y2FnzzNmFe-0ns9XQgp=QYeFpRsdiCAnw@mail.gmail.com/>.
> Looking at the C binder version, binder_install_single_page() installs
> pages into userspace page tables in a VM_MIXEDMAP mapping using
> vm_insert_page(), and when you do that with pages from the page
> allocator, userspace can grab references to them through GUP-fast (and
> I think also through GUP). (See how vm_insert_page() and
> vm_get_page_prot() don't use pte_mkspecial(), which is pretty much the
> only thing that can stop GUP-fast on most architectures.)
>
> My understanding is that the combination VM_IO|VM_MIXEDMAP would stop
> normal GUP, but currently the only way to block GUP-fast is to use
> VM_PFNMAP. (Which, as far as I understand, is also why GPU drivers use
> VM_PFNMAP so much.) Maybe we should change that, so that VM_IO and/or
> VM_MIXEDMAP blocks GUP in the region and causes installed PTEs to be
> marked with pte_mkspecial()?
>
> I am not entirely sure about this stuff, but I was recently looking at
> net/packet/af_packet.c, and I tested that vmsplice() can grab
> references to the high-order compound pages that
> alloc_one_pg_vec_page() allocates with __get_free_pages(GFP_KERNEL |
> __GFP_COMP | __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY, order),
> packet_mmap() inserts with vm_insert_page(), and free_pg_vec() drops
> with free_pages(). (But that all happens to actually work fine,
> free_pages() actually handles refcounted compound pages properly.)

And also, the C binder driver wants to free pages in its shrinker
callback, but those pages might still be mapped into userspace. Binder
tries to zap such userspace mappings, but it does that by absolute
virtual address instead of going through the rmap (see
binder_alloc_free_page()), so it will miss page mappings in VMAs that
have been mremap()'d (though legitimate userspace never does that with
binder VMAs) or are concurrently being torn down by munmap(); so
currently the thing that keeps this from falling apart is that if page
mappings are left over somewhere, the page refcount ensures that this
userspace-mapped page doesn't get freed.

(I think the C binder code does its job, but is not exactly a great
model for how to write a clean driver that integrates nicely with the
rest of the kernel.)
Asahi Lina Dec. 2, 2024, 12:03 p.m. UTC | #19
On 11/19/24 8:24 PM, Abdiel Janulgue wrote:
> This series aims to add support for pages that are not constructed by an
> instance of the rust Page abstraction, for example those returned by
> vmalloc_to_page() or virt_to_page().
> 
> Changes sinve v3:
> - Use the struct page's reference count to decide when to free the
>   allocation (Alice Ryhl, Boqun Feng).
> - Make Page::page_slice_to_page handle virt_to_page cases as well
>   (Danilo Krummrich).
> - Link to v2: https://lore.kernel.org/lkml/20241022224832.1505432-1-abdiel.janulgue@gmail.com/
> 
> Changes since v2:
> - Use Owned and Ownable types for constructing Page as suggested in
>   instad of using ptr::read().
> - Link to v1: https://lore.kernel.org/rust-for-linux/20241007202752.3096472-1-abdiel.janulgue@gmail.com/
> 
> Abdiel Janulgue (2):
>   rust: page: use the page's reference count to decide when to free the
>     allocation
>   rust: page: Extend support to existing struct page mappings
> 
>  rust/bindings/bindings_helper.h |   1 +
>  rust/helpers/page.c             |  20 +++++
>  rust/kernel/page.rs             | 135 ++++++++++++++++++++++++++++----
>  3 files changed, 142 insertions(+), 14 deletions(-)
> 
> 
> base-commit: b2603f8ac8217bc59f5c7f248ac248423b9b99cb

Just wanted to comment on an upcoming use case I have that will need
this, to make sure we're aligned. I want to use the page allocator to
manage GPU page tables (currently done via an io-pgtable patch and
abstraction but that's going away because it turned out to be too
intrusive to upstream).

Since I'm dealing with page tables which are their own tree ownership
structure, and I don't want to duplicate management of the page life
cycles, this means I need to be able to:

- Convert a Rust-allocated and owned page *into* its physical address
(page_to_phys()).
- Convert a physical address *into* a Rust-allocated and owned page
(phys_to_page()).
- Borrow a Rust Page from a physical address (so I can do read/write
operations on its data without intending to destroy it).

Conceptually, the first two are like ARef::into_raw() and
ARef::from_raw() (or Box for that matter), while the third would
basically return a &Page with an arbitrary lifetime (up to the caller to
enforce the rules). The latter two would be unsafe functions by nature,
of course.

I think this would work just as well with some kind of Owned/Ownable
solution. Basically, I just need to be able to express the two concepts
of "Page owned and allocated by Rust" and "Page borrowed from a physical
address".

This maps to pagetable management like this:
- On PT allocation, a Page is allocated, cleared, and turned into its
physical address (to be populated in the parent PTE or top-level TTB)
- On PT free, a page physical address is converted back to a Page, its
PTEs are walked to recursively free child PTs or verify they are empty
entries for leaf PTs (invariant: no leaf PTEs, all mappings should be
removed before PT free) and dropped.
- On PT walk/PTE insertion and removal, a physical address is borrowed
as a Page, then `Page::with_page_mapped()` is used to perform R/W
operations on the PTEs contained within.

Tying the lifetime of actual leaf data pages mapped into the page table
to the page table itself is a higher-level concern that isn't relevant
here, drm_gpuvm handles that part and those pages are not allocated
directly via the page allocator, but rather as GEM objects which
ultimately come from shmem)

(Note: this hardware is always 64-bit without highmem so those concerns
don't apply here.)

~~ Lina
Alice Ryhl Dec. 3, 2024, 9:08 a.m. UTC | #20
On Mon, Dec 2, 2024 at 1:03 PM Asahi Lina <lina@asahilina.net> wrote:
>
> On 11/19/24 8:24 PM, Abdiel Janulgue wrote:
> > This series aims to add support for pages that are not constructed by an
> > instance of the rust Page abstraction, for example those returned by
> > vmalloc_to_page() or virt_to_page().
> >
> > Changes sinve v3:
> > - Use the struct page's reference count to decide when to free the
> >   allocation (Alice Ryhl, Boqun Feng).
> > - Make Page::page_slice_to_page handle virt_to_page cases as well
> >   (Danilo Krummrich).
> > - Link to v2: https://lore.kernel.org/lkml/20241022224832.1505432-1-abdiel.janulgue@gmail.com/
> >
> > Changes since v2:
> > - Use Owned and Ownable types for constructing Page as suggested in
> >   instad of using ptr::read().
> > - Link to v1: https://lore.kernel.org/rust-for-linux/20241007202752.3096472-1-abdiel.janulgue@gmail.com/
> >
> > Abdiel Janulgue (2):
> >   rust: page: use the page's reference count to decide when to free the
> >     allocation
> >   rust: page: Extend support to existing struct page mappings
> >
> >  rust/bindings/bindings_helper.h |   1 +
> >  rust/helpers/page.c             |  20 +++++
> >  rust/kernel/page.rs             | 135 ++++++++++++++++++++++++++++----
> >  3 files changed, 142 insertions(+), 14 deletions(-)
> >
> >
> > base-commit: b2603f8ac8217bc59f5c7f248ac248423b9b99cb
>
> Just wanted to comment on an upcoming use case I have that will need
> this, to make sure we're aligned. I want to use the page allocator to
> manage GPU page tables (currently done via an io-pgtable patch and
> abstraction but that's going away because it turned out to be too
> intrusive to upstream).
>
> Since I'm dealing with page tables which are their own tree ownership
> structure, and I don't want to duplicate management of the page life
> cycles, this means I need to be able to:
>
> - Convert a Rust-allocated and owned page *into* its physical address
> (page_to_phys()).
> - Convert a physical address *into* a Rust-allocated and owned page
> (phys_to_page()).
> - Borrow a Rust Page from a physical address (so I can do read/write
> operations on its data without intending to destroy it).
>
> Conceptually, the first two are like ARef::into_raw() and
> ARef::from_raw() (or Box for that matter), while the third would
> basically return a &Page with an arbitrary lifetime (up to the caller to
> enforce the rules). The latter two would be unsafe functions by nature,
> of course.
>
> I think this would work just as well with some kind of Owned/Ownable
> solution. Basically, I just need to be able to express the two concepts
> of "Page owned and allocated by Rust" and "Page borrowed from a physical
> address".

I actually think the Owned/Ownable solution is even better for what
you need, because having a borrowed reference to the current Page
abstraction is pretty awkward as it assumes that the page is always
owned.

Alice

> This maps to pagetable management like this:
> - On PT allocation, a Page is allocated, cleared, and turned into its
> physical address (to be populated in the parent PTE or top-level TTB)
> - On PT free, a page physical address is converted back to a Page, its
> PTEs are walked to recursively free child PTs or verify they are empty
> entries for leaf PTs (invariant: no leaf PTEs, all mappings should be
> removed before PT free) and dropped.
> - On PT walk/PTE insertion and removal, a physical address is borrowed
> as a Page, then `Page::with_page_mapped()` is used to perform R/W
> operations on the PTEs contained within.
>
> Tying the lifetime of actual leaf data pages mapped into the page table
> to the page table itself is a higher-level concern that isn't relevant
> here, drm_gpuvm handles that part and those pages are not allocated
> directly via the page allocator, but rather as GEM objects which
> ultimately come from shmem)
>
> (Note: this hardware is always 64-bit without highmem so those concerns
> don't apply here.)
>
> ~~ Lina
>