diff mbox series

[3/3] rust: add abstraction for `struct page`

Message ID 20240124-alice-mm-v1-3-d1abcec83c44@google.com (mailing list archive)
State New
Headers show
Series Memory management patches needed by Rust Binder | expand

Commit Message

Alice Ryhl Jan. 24, 2024, 11:20 a.m. UTC
Adds a new struct called `Page` that wraps a pointer to `struct page`.
This struct is assumed to hold ownership over the page, so that Rust
code can allocate and manage pages directly.

The page type has various methods for reading and writing into the page.
These methods will temporarily map the page to allow the operation. All
of these methods use a helper that takes an offset and length, performs
bounds checks, and returns a pointer to the given offset in the page.

This patch only adds support for pages of order zero, as that is all
Rust Binder needs. However, it is written to make it easy to add support
for higher-order pages in the future. To do that, you would add a const
generic parameter to `Page` that specifies the order. Most of the
methods do not need to be adjusted, as the logic for dealing with
mapping multiple pages at once can be isolated to just the
`with_pointer_into_page` method. Finally, the struct can be renamed to
`Pages<ORDER>`, and the type alias `Page = Pages<0>` can be introduced.

Rust Binder needs to manage pages directly as that is how transactions
are delivered: Each process has an mmap'd region for incoming
transactions. When an incoming transaction arrives, the Binder driver
will choose a region in the mmap, allocate and map the relevant pages
manually, and copy the incoming transaction directly into the page. This
architecture allows the driver to copy transactions directly from the
address space of one process to another, without an intermediate copy
to a kernel buffer.

This code is based on Wedson's page abstractions from the old rust
branch, but it has been modified by Alice by removing the incomplete
support for higher-order pages, and by introducing the `with_*` helpers
to consolidate the bounds checking logic into a single place.

Co-developed-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 rust/bindings/bindings_helper.h |   1 +
 rust/helpers.c                  |  20 +++++
 rust/kernel/lib.rs              |   1 +
 rust/kernel/page.rs             | 176 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 198 insertions(+)

Comments

Boqun Feng Jan. 26, 2024, 12:46 a.m. UTC | #1
On Wed, Jan 24, 2024 at 11:20:23AM +0000, Alice Ryhl wrote:
[...]
> +    /// Runs a piece of code with a raw pointer to a slice of this page, with
> +    /// bounds checking.
> +    ///
> +    /// If `f` is called, then it will be called with a pointer that points at
> +    /// `off` bytes into the page, and the pointer will be valid for at least
> +    /// `len` bytes. The pointer is only valid on this task, as this method uses
> +    /// a local mapping.
> +    ///
> +    /// If `off` and `len` refers to a region outside of this page, then this
> +    /// method returns `EINVAL` and does not call `f`.
> +    pub fn with_pointer_into_page<T>(

Name it as `with_slice_in_page` maybe?

> +        &self,
> +        off: usize,
> +        len: usize,
> +        f: impl FnOnce(*mut u8) -> Result<T>,
> +    ) -> Result<T> {
> +        let bounds_ok = off <= PAGE_SIZE && len <= PAGE_SIZE && (off + len) <= PAGE_SIZE;
> +
> +        if bounds_ok {
> +            self.with_page_mapped(move |page_addr| {
> +                // SAFETY: The `off` integer is at most `PAGE_SIZE`, so this pointer offset will
> +                // result in a pointer that is in bounds or one off the end of the page.
> +                f(unsafe { page_addr.cast::<u8>().add(off) })
> +            })
> +        } else {
> +            Err(EINVAL)
> +        }
> +    }
> +
> +    /// Maps the page and reads from it into the given buffer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// Callers must ensure that `dest` is valid for writing `len` bytes.
> +    pub unsafe fn read(&self, dest: *mut u8, offset: usize, len: usize) -> Result {
> +        self.with_pointer_into_page(offset, len, move |from_ptr| {
> +            // SAFETY: If `with_pointer_into_page` calls into this closure, then
> +            // it has performed a bounds check and guarantees that `from_ptr` is
> +            // valid for `len` bytes.
> +            unsafe { ptr::copy(from_ptr, dest, len) };
> +            Ok(())
> +        })
> +    }
> +
> +    /// Maps the page and writes into it from the given buffer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// Callers must ensure that `src` is valid for reading `len` bytes.
> +    pub unsafe fn write(&self, src: *const u8, offset: usize, len: usize) -> Result {

Use a slice like type as `src` maybe? Then the function can be safe:

	pub fn write<S: AsRef<[u8]>>(&self, src: S, offset: usize) -> Result

Besides, since `Page` impl `Sync`, shouldn't this `write` and the
`fill_zero` be a `&mut self` function? Or make them both `unsafe`
because of potential race and add some safety requirement?

Regards,
Boqun

> +        self.with_pointer_into_page(offset, len, move |to_ptr| {
> +            // SAFETY: If `with_pointer_into_page` calls into this closure, then
> +            // it has performed a bounds check and guarantees that `to_ptr` is
> +            // valid for `len` bytes.
> +            unsafe { ptr::copy(src, to_ptr, len) };
> +            Ok(())
> +        })
> +    }
> +
> +    /// Maps the page and zeroes the given slice.
> +    pub fn fill_zero(&self, offset: usize, len: usize) -> Result {
> +        self.with_pointer_into_page(offset, len, move |to_ptr| {
> +            // SAFETY: If `with_pointer_into_page` calls into this closure, then
> +            // it has performed a bounds check and guarantees that `to_ptr` is
> +            // valid for `len` bytes.
> +            unsafe { ptr::write_bytes(to_ptr, 0u8, len) };
> +            Ok(())
> +        })
> +    }
> +
> +    /// Copies data from userspace into this page.
> +    pub fn copy_into_page(
> +        &self,
> +        reader: &mut UserSlicePtrReader,
> +        offset: usize,
> +        len: usize,
> +    ) -> Result {
> +        self.with_pointer_into_page(offset, len, move |to_ptr| {
> +            // SAFETY: If `with_pointer_into_page` calls into this closure, then
> +            // it has performed a bounds check and guarantees that `to_ptr` is
> +            // valid for `len` bytes.
> +            unsafe { reader.read_raw(to_ptr, len) }
> +        })
> +    }
> +}
> +
> +impl Drop for Page {
> +    fn drop(&mut self) {
> +        // SAFETY: By the type invariants, we have ownership of the page and can
> +        // free it.
> +        unsafe { bindings::__free_pages(self.page.as_ptr(), 0) };
> +    }
> +}
> 
> -- 
> 2.43.0.429.g432eaa2c6b-goog
>
Alice Ryhl Jan. 26, 2024, 12:33 p.m. UTC | #2
On Fri, Jan 26, 2024 at 1:47 AM Boqun Feng <boqun.feng@gmail.com> wrote:
>
> On Wed, Jan 24, 2024 at 11:20:23AM +0000, Alice Ryhl wrote:
> > +    /// Maps the page and reads from it into the given buffer.
> > +    ///
> > +    /// # Safety
> > +    ///
> > +    /// Callers must ensure that `dest` is valid for writing `len` bytes.
> > +    pub unsafe fn read(&self, dest: *mut u8, offset: usize, len: usize) -> Result {
> > +        self.with_pointer_into_page(offset, len, move |from_ptr| {
> > +            // SAFETY: If `with_pointer_into_page` calls into this closure, then
> > +            // it has performed a bounds check and guarantees that `from_ptr` is
> > +            // valid for `len` bytes.
> > +            unsafe { ptr::copy(from_ptr, dest, len) };
> > +            Ok(())
> > +        })
> > +    }
> > +
> > +    /// Maps the page and writes into it from the given buffer.
> > +    ///
> > +    /// # Safety
> > +    ///
> > +    /// Callers must ensure that `src` is valid for reading `len` bytes.
> > +    pub unsafe fn write(&self, src: *const u8, offset: usize, len: usize) -> Result {
>
> Use a slice like type as `src` maybe? Then the function can be safe:
>
>         pub fn write<S: AsRef<[u8]>>(&self, src: S, offset: usize) -> Result
>
> Besides, since `Page` impl `Sync`, shouldn't this `write` and the
> `fill_zero` be a `&mut self` function? Or make them both `unsafe`
> because of potential race and add some safety requirement?

Ideally, we don't want data races with these methods to be UB. They
could be mapped into the address space of a userspace process.

Alice
Boqun Feng Jan. 26, 2024, 6:28 p.m. UTC | #3
On Fri, Jan 26, 2024 at 01:33:46PM +0100, Alice Ryhl wrote:
> On Fri, Jan 26, 2024 at 1:47 AM Boqun Feng <boqun.feng@gmail.com> wrote:
> >
> > On Wed, Jan 24, 2024 at 11:20:23AM +0000, Alice Ryhl wrote:
> > > +    /// Maps the page and reads from it into the given buffer.
> > > +    ///
> > > +    /// # Safety
> > > +    ///
> > > +    /// Callers must ensure that `dest` is valid for writing `len` bytes.
> > > +    pub unsafe fn read(&self, dest: *mut u8, offset: usize, len: usize) -> Result {
> > > +        self.with_pointer_into_page(offset, len, move |from_ptr| {
> > > +            // SAFETY: If `with_pointer_into_page` calls into this closure, then
> > > +            // it has performed a bounds check and guarantees that `from_ptr` is
> > > +            // valid for `len` bytes.
> > > +            unsafe { ptr::copy(from_ptr, dest, len) };
> > > +            Ok(())
> > > +        })
> > > +    }
> > > +
> > > +    /// Maps the page and writes into it from the given buffer.
> > > +    ///
> > > +    /// # Safety
> > > +    ///
> > > +    /// Callers must ensure that `src` is valid for reading `len` bytes.
> > > +    pub unsafe fn write(&self, src: *const u8, offset: usize, len: usize) -> Result {
> >
> > Use a slice like type as `src` maybe? Then the function can be safe:
> >
> >         pub fn write<S: AsRef<[u8]>>(&self, src: S, offset: usize) -> Result
> >
> > Besides, since `Page` impl `Sync`, shouldn't this `write` and the
> > `fill_zero` be a `&mut self` function? Or make them both `unsafe`
> > because of potential race and add some safety requirement?
> 
> Ideally, we don't want data races with these methods to be UB. They

I understand that, but in the current code, you can write:

	CPU 0			CPU 1
	=====			=====

	page.write(src1, 0, 8);
				page.write(src2, 0, 8);

and it's a data race at kernel end. So my question is more how we can
prevent the UB ;-)

Regards,
Boqun

> could be mapped into the address space of a userspace process.
> 
> Alice
Matthew Wilcox Jan. 29, 2024, 5:59 p.m. UTC | #4
On Wed, Jan 24, 2024 at 11:20:23AM +0000, Alice Ryhl wrote:
> Adds a new struct called `Page` that wraps a pointer to `struct page`.
> This struct is assumed to hold ownership over the page, so that Rust
> code can allocate and manage pages directly.

OK ...

> This patch only adds support for pages of order zero, as that is all
> Rust Binder needs. However, it is written to make it easy to add support
> for higher-order pages in the future. To do that, you would add a const
> generic parameter to `Page` that specifies the order. Most of the
> methods do not need to be adjusted, as the logic for dealing with
> mapping multiple pages at once can be isolated to just the
> `with_pointer_into_page` method. Finally, the struct can be renamed to
> `Pages<ORDER>`, and the type alias `Page = Pages<0>` can be introduced.

This description concerns me because it reads like you're not keeping
up with the current thinking in MM about what pages are and how we're
improving the type hierarchy.  As in, we're creating one instead of
allowing the current mish-mash of absolutely everything to continue.

Are you the right person to ask about the operations that Binder does
with a page so we can figure out where it fits in the type hierarchy?

> Rust Binder needs to manage pages directly as that is how transactions
> are delivered: Each process has an mmap'd region for incoming
> transactions. When an incoming transaction arrives, the Binder driver
> will choose a region in the mmap, allocate and map the relevant pages
> manually, and copy the incoming transaction directly into the page. This
> architecture allows the driver to copy transactions directly from the
> address space of one process to another, without an intermediate copy
> to a kernel buffer.

Everything about this says "This is what a first year comp sci student
thinks will be fast".  Oh well, the thinking here isn't your fault.

> @@ -127,6 +129,24 @@ int rust_helper_signal_pending(struct task_struct *t)
>  }
>  EXPORT_SYMBOL_GPL(rust_helper_signal_pending);
>  
> +struct page *rust_helper_alloc_pages(gfp_t gfp_mask, unsigned int order)
> +{
> +       return alloc_pages(gfp_mask, order);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_alloc_pages);
> +
> +void *rust_helper_kmap_local_page(struct page *page)
> +{
> +       return kmap_local_page(page);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_kmap_local_page);
> +
> +void rust_helper_kunmap_local(const void *addr)
> +{
> +       kunmap_local(addr);
> +}
> +EXPORT_SYMBOL_GPL(rust_helper_kunmap_local);

I remain opposed to all these fidgetty little helpers.  Particularly
when they're noops on machines without HIGHMEM, which is ~all of them.

> +/// A bitwise shift for the page size.
> +pub const PAGE_SHIFT: usize = bindings::PAGE_SHIFT as usize;

Does PAGE_SHIFT really need to be as large as 'usize'?  If it's more
than 63 by the time I retire, I'll be shocked.  If it's more than 127
by the time I die, I'll be even more shocked.  And it won't get to 255
by the heat death of the universe.

> +/// The number of bytes in a page.
> +pub const PAGE_SIZE: usize = 1 << PAGE_SHIFT;

This is appropriately usize.

> +/// A bitwise mask for the page size.
> +pub const PAGE_MASK: usize = PAGE_SIZE - 1;

Are you trying to get somebody killed?

include/asm-generic/page.h:#define PAGE_MASK    (~(PAGE_SIZE-1))

Defining PAGE_MASK to be the opposite set of bits in C and Rust is
going to bite us all day every day for a decade.

> +impl Page {
> +    /// Allocates a new set of contiguous pages.
> +    pub fn new() -> Result<Self, AllocError> {
> +        // SAFETY: These are the correct arguments to allocate a single page.
> +        let page = unsafe {
> +            bindings::alloc_pages(
> +                bindings::GFP_KERNEL | bindings::__GFP_ZERO | bindings::__GFP_HIGHMEM,
> +                0,
> +            )
> +        };

This feels too Binder-specific to be 'Page'.  Pages are not necessarily
allocated with GFP_HIGHMEM, nor are they necessarily zeroed.  Maybe you
want a BinderPage type?
Carlos Llamas Jan. 29, 2024, 6:56 p.m. UTC | #5
On Mon, Jan 29, 2024 at 05:59:53PM +0000, Matthew Wilcox wrote:
> On Wed, Jan 24, 2024 at 11:20:23AM +0000, Alice Ryhl wrote:
> > Adds a new struct called `Page` that wraps a pointer to `struct page`.
> > This struct is assumed to hold ownership over the page, so that Rust
> > code can allocate and manage pages directly.
> 
> OK ...
> 
> > This patch only adds support for pages of order zero, as that is all
> > Rust Binder needs. However, it is written to make it easy to add support
> > for higher-order pages in the future. To do that, you would add a const
> > generic parameter to `Page` that specifies the order. Most of the
> > methods do not need to be adjusted, as the logic for dealing with
> > mapping multiple pages at once can be isolated to just the
> > `with_pointer_into_page` method. Finally, the struct can be renamed to
> > `Pages<ORDER>`, and the type alias `Page = Pages<0>` can be introduced.
> 
> This description concerns me because it reads like you're not keeping
> up with the current thinking in MM about what pages are and how we're
> improving the type hierarchy.  As in, we're creating one instead of
> allowing the current mish-mash of absolutely everything to continue.
> 
> Are you the right person to ask about the operations that Binder does
> with a page so we can figure out where it fits in the type hierarchy?
 
I would guess you are suggesting a transition to folios here? I don't
think there is anything in binder that would impede such a change. The
core idea behind binder IPC is to skip kernel buffering and perform
instead a "copy-once" of messages across users memory. In theory this
seems efficient but I haven't seen any data proving so. So take that
with a grain of salt.

The size of these binder messages is not limited per se and can trigger
the allocation of multiple pages. However, in reality the vast majority
of transactions are under 1K payload. FWICT, it seems reasonable to
switch over to folios.

The only concern I have is that we've implemented a binder LRU-shrinker
mechanism. We add the unused pages to our freelist and give them back to
the system on demand. However, if a new transaction requests the unused
page before it gets reclaimed it is simply removed from this freelist.
This is convenient as we avoid taking the mmap sem during this process.
I don't know how this mechanism would look with folios though?

--
Carlos Llamas
Matthew Wilcox Jan. 29, 2024, 8:19 p.m. UTC | #6
On Mon, Jan 29, 2024 at 06:56:58PM +0000, Carlos Llamas wrote:
> On Mon, Jan 29, 2024 at 05:59:53PM +0000, Matthew Wilcox wrote:
> > On Wed, Jan 24, 2024 at 11:20:23AM +0000, Alice Ryhl wrote:
> > > This patch only adds support for pages of order zero, as that is all
> > > Rust Binder needs. However, it is written to make it easy to add support
> > > for higher-order pages in the future. To do that, you would add a const
> > > generic parameter to `Page` that specifies the order. Most of the
> > > methods do not need to be adjusted, as the logic for dealing with
> > > mapping multiple pages at once can be isolated to just the
> > > `with_pointer_into_page` method. Finally, the struct can be renamed to
> > > `Pages<ORDER>`, and the type alias `Page = Pages<0>` can be introduced.
> > 
> > This description concerns me because it reads like you're not keeping
> > up with the current thinking in MM about what pages are and how we're
> > improving the type hierarchy.  As in, we're creating one instead of
> > allowing the current mish-mash of absolutely everything to continue.
> > 
> > Are you the right person to ask about the operations that Binder does
> > with a page so we can figure out where it fits in the type hierarchy?
>  
> I would guess you are suggesting a transition to folios here? I don't

I don't think folios are the right type to use.  Folios are for files and
anonymous memory; things which are managed on the LRU, have refcounts and
mapcounts, can be found with an rmap, need private data, belong to memory
control groups, belong to either an inode or an anon_vma, and so on.

It's _possible_ that Binder fits this use case well enough, but my
guess is that it needs its own type, or maybe it's the initial example
of a different type from folios (right now we have three types: folios,
slabs and ptdescs, but more are on their way).

> The only concern I have is that we've implemented a binder LRU-shrinker
> mechanism. We add the unused pages to our freelist and give them back to
> the system on demand. However, if a new transaction requests the unused
> page before it gets reclaimed it is simply removed from this freelist.
> This is convenient as we avoid taking the mmap sem during this process.
> I don't know how this mechanism would look with folios though?

This doesn't seem like too much of a problem.  The key thing is that
with memdescs, you get to define your own data type of whatever size
makes sense for you.  Until then you're limited to what we can fit into
a struct page (and we need to be careful not to step on stuff that other
people look at like the refcount).
Alice Ryhl Jan. 29, 2024, 9:27 p.m. UTC | #7
On Mon, Jan 29, 2024 at 6:59 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Jan 24, 2024 at 11:20:23AM +0000, Alice Ryhl wrote:
> > Adds a new struct called `Page` that wraps a pointer to `struct page`.
> > This struct is assumed to hold ownership over the page, so that Rust
> > code can allocate and manage pages directly.
>
> OK ...

Thank you for taking your time to review my wrappers!

> > This patch only adds support for pages of order zero, as that is all
> > Rust Binder needs. However, it is written to make it easy to add support
> > for higher-order pages in the future. To do that, you would add a const
> > generic parameter to `Page` that specifies the order. Most of the
> > methods do not need to be adjusted, as the logic for dealing with
> > mapping multiple pages at once can be isolated to just the
> > `with_pointer_into_page` method. Finally, the struct can be renamed to
> > `Pages<ORDER>`, and the type alias `Page = Pages<0>` can be introduced.
>
> This description concerns me because it reads like you're not keeping
> up with the current thinking in MM about what pages are and how we're
> improving the type hierarchy.  As in, we're creating one instead of
> allowing the current mish-mash of absolutely everything to continue.

That's very possible. I have a good understanding about how C binder
interacts with pages, but I don't know too much about the abstractions
that Binder is not using.

> Are you the right person to ask about the operations that Binder does
> with a page so we can figure out where it fits in the type hierarchy?

I can definitely answer questions about that. If we can find another
abstraction that is not too far away from what we are doing today,
then I am open to looking into whether we can do that instead of the
current approach. However, I want to avoid large deviations from C
Binder, at least before I get Rust binder into the kernel tree.

A short overview of what Binder does:
* Every process will mmap a region of memory. This memory region will
contain the data for all incoming transactions sent to that process.
Only the kernel can modify these pages.
* Binder has a data structure that keeps track of where the
allocations in this mmap'd region are. It has functionality to find an
open interval of a given size (so it's essentially an allocator). This
is called the "range allocator".
* The pages in this region are allocated lazily whenever the range
allocator starts using the page.
* When the range allocator stops using a page, it is marked as unused
and put on an LRU list.
* There's a custom shrinker that can free pages not currently used by
any allocation.

So when process A sends a transaction to process B using the
appropriate ioctl, process A will go into the range allocator for
process B and reserve a range for the transaction that A is sending.
If any pages in the resulting range are missing, then new pages are
allocated, and process A will vm_insert_page them into process B's
mmap. Then, process A will map the page with kmap_local_page, and use
copy_from_user to copy data *directly* from A's userspace into a page
in process B's address space.

Note that transactions don't have uniform sizes. Think of them as
arbitrary buffers provided by userspace. They will generally not be
larger than a few hundred bytes each, but larger transactions are
possible. The mmap for a process is usually 4 MB.

The biggest user of Page is here in the RFC:
https://lore.kernel.org/rust-for-linux/20231101-rust-binder-v1-19-08ba9197f637@google.com/

The range allocator is defined here:
https://lore.kernel.org/rust-for-linux/20231101-rust-binder-v1-6-08ba9197f637@google.com/

> > Rust Binder needs to manage pages directly as that is how transactions
> > are delivered: Each process has an mmap'd region for incoming
> > transactions. When an incoming transaction arrives, the Binder driver
> > will choose a region in the mmap, allocate and map the relevant pages
> > manually, and copy the incoming transaction directly into the page. This
> > architecture allows the driver to copy transactions directly from the
> > address space of one process to another, without an intermediate copy
> > to a kernel buffer.
>
> Everything about this says "This is what a first year comp sci student
> thinks will be fast".  Oh well, the thinking here isn't your fault.

Ultimately, I am just replicating what C Binder does.

I had a long discussion with Liam Howlett at Plumbers where we
discussed various alternatives to the hand-rolled stuff that Binder
does. Liam thought that we may be able to replace the entire thing
with maple trees. These are things that I definitely want to
experiment with, but I am reluctant to try replacing the entire thing
with a maple tree, at least until I get Rust Binder into the kernel
tree.

In general, there are many places in Binder where we are hand-rolling
something that has an alternative elsewhere in the kernel, but
replacing them is not always trivial. The hand-rolled versions often
have Binder-specific optimizations that make it a regression to
replace it with the general thing.

> > @@ -127,6 +129,24 @@ int rust_helper_signal_pending(struct task_struct *t)
> >  }
> >  EXPORT_SYMBOL_GPL(rust_helper_signal_pending);
> >
> > +struct page *rust_helper_alloc_pages(gfp_t gfp_mask, unsigned int order)
> > +{
> > +       return alloc_pages(gfp_mask, order);
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_alloc_pages);
> > +
> > +void *rust_helper_kmap_local_page(struct page *page)
> > +{
> > +       return kmap_local_page(page);
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_kmap_local_page);
> > +
> > +void rust_helper_kunmap_local(const void *addr)
> > +{
> > +       kunmap_local(addr);
> > +}
> > +EXPORT_SYMBOL_GPL(rust_helper_kunmap_local);
>
> I remain opposed to all these fidgetty little helpers.  Particularly
> when they're noops on machines without HIGHMEM, which is ~all of them.

I don't disagree with you, but there's not much I can do about them. I
can wrap them in #ifdef HIGHMEM if they are no-ops or exported without
HIGHMEM?

> > +/// A bitwise shift for the page size.
> > +pub const PAGE_SHIFT: usize = bindings::PAGE_SHIFT as usize;
>
> Does PAGE_SHIFT really need to be as large as 'usize'?  If it's more
> than 63 by the time I retire, I'll be shocked.  If it's more than 127
> by the time I die, I'll be even more shocked.  And it won't get to 255
> by the heat death of the universe.

Rust usually requires that both operands to an integer operation are
of the same integer type, requiring explicit conversions if that is
not the case. That is why I am using usize here.

However, it seems like Rust doesn't actually require that for the <<
operator, so I guess it doesn't matter much for this particular
constant.

> > +/// A bitwise mask for the page size.
> > +pub const PAGE_MASK: usize = PAGE_SIZE - 1;
>
> Are you trying to get somebody killed?
>
> include/asm-generic/page.h:#define PAGE_MASK    (~(PAGE_SIZE-1))
>
> Defining PAGE_MASK to be the opposite set of bits in C and Rust is
> going to bite us all day every day for a decade.

Oops, that's embarrassing. Thank you for catching that. I'll make sure
to change it.

> > +impl Page {
> > +    /// Allocates a new set of contiguous pages.
> > +    pub fn new() -> Result<Self, AllocError> {
> > +        // SAFETY: These are the correct arguments to allocate a single page.
> > +        let page = unsafe {
> > +            bindings::alloc_pages(
> > +                bindings::GFP_KERNEL | bindings::__GFP_ZERO | bindings::__GFP_HIGHMEM,
> > +                0,
> > +            )
> > +        };
>
> This feels too Binder-specific to be 'Page'.  Pages are not necessarily
> allocated with GFP_HIGHMEM, nor are they necessarily zeroed.  Maybe you
> want a BinderPage type?

We can add a constructor that takes a flag argument, so this type
doesn't necessarily have to be tied to those specific arguments.

Alice
Andreas Hindborg Jan. 30, 2024, 9:02 a.m. UTC | #8
Matthew Wilcox <willy@infradead.org> writes:

> On Wed, Jan 24, 2024 at 11:20:23AM +0000, Alice Ryhl wrote:
>> +impl Page {
>> +    /// Allocates a new set of contiguous pages.
>> +    pub fn new() -> Result<Self, AllocError> {
>> +        // SAFETY: These are the correct arguments to allocate a single page.
>> +        let page = unsafe {
>> +            bindings::alloc_pages(
>> +                bindings::GFP_KERNEL | bindings::__GFP_ZERO | bindings::__GFP_HIGHMEM,
>> +                0,
>> +            )
>> +        };
>
> This feels too Binder-specific to be 'Page'.  Pages are not necessarily
> allocated with GFP_HIGHMEM, nor are they necessarily zeroed.  Maybe you
> want a BinderPage type?

Rust null block uses the same definition of these flags [1], so there is
at least that synergy.

I feel like we had the discussion of the flags before, although I can't
find the thread now. I think the conclusion was that we fix them until
we have code that need to actually change them (as to not add dead
code).

BR Andreas

[1] https://github.com/metaspace/linux/blob/702026e6645193fc89b7d55e00dac75fd492bfb8/rust/kernel/pages.rs#L28
Alice Ryhl Jan. 30, 2024, 9:06 a.m. UTC | #9
On Tue, Jan 30, 2024 at 10:02 AM Andreas Hindborg
<a.hindborg@samsung.com> wrote:
>
>
> Matthew Wilcox <willy@infradead.org> writes:
>
> > On Wed, Jan 24, 2024 at 11:20:23AM +0000, Alice Ryhl wrote:
> >> +impl Page {
> >> +    /// Allocates a new set of contiguous pages.
> >> +    pub fn new() -> Result<Self, AllocError> {
> >> +        // SAFETY: These are the correct arguments to allocate a single page.
> >> +        let page = unsafe {
> >> +            bindings::alloc_pages(
> >> +                bindings::GFP_KERNEL | bindings::__GFP_ZERO | bindings::__GFP_HIGHMEM,
> >> +                0,
> >> +            )
> >> +        };
> >
> > This feels too Binder-specific to be 'Page'.  Pages are not necessarily
> > allocated with GFP_HIGHMEM, nor are they necessarily zeroed.  Maybe you
> > want a BinderPage type?
>
> Rust null block uses the same definition of these flags [1], so there is
> at least that synergy.
>
> I feel like we had the discussion of the flags before, although I can't
> find the thread now. I think the conclusion was that we fix them until
> we have code that need to actually change them (as to not add dead
> code).

I don't think there's any problem with replacing the constructor with
one that takes a flag argument dead-code-wise. I can definitely do
that.

Alice
Andreas Hindborg Jan. 30, 2024, 9:15 a.m. UTC | #10
Boqun Feng <boqun.feng@gmail.com> writes:

> On Wed, Jan 24, 2024 at 11:20:23AM +0000, Alice Ryhl wrote:

>> +
>> +    /// Maps the page and writes into it from the given buffer.
>> +    ///
>> +    /// # Safety
>> +    ///
>> +    /// Callers must ensure that `src` is valid for reading `len` bytes.
>> +    pub unsafe fn write(&self, src: *const u8, offset: usize, len: usize) -> Result {
>
> Use a slice like type as `src` maybe? Then the function can be safe:
>
> 	pub fn write<S: AsRef<[u8]>>(&self, src: S, offset: usize) -> Result
>
> Besides, since `Page` impl `Sync`, shouldn't this `write` and the
> `fill_zero` be a `&mut self` function? Or make them both `unsafe`
> because of potential race and add some safety requirement?

We can add a safe version that takes a slice later, as here [1]. Same
for the with_* that take a closure.

It would be nice to model ownership of pages that are only mapped in
kernel with `&mut`.

BR Andreas

[1] https://github.com/metaspace/linux/blob/702026e6645193fc89b7d55e00dac75fd492bfb8/rust/kernel/pages.rs#L178
Trevor Gross Feb. 1, 2024, 6:02 a.m. UTC | #11
On Wed, Jan 24, 2024 at 6:22 AM Alice Ryhl <aliceryhl@google.com> wrote:
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Kernel page allocation and management.
> +
> +use crate::{bindings, error::code::*, error::Result, user_ptr::UserSlicePtrReader};
> +use core::{
> +    alloc::AllocError,
> +    ffi::c_void,
> +    ptr::{self, NonNull},
> +};
> +
> +/// A bitwise shift for the page size.
> +pub const PAGE_SHIFT: usize = bindings::PAGE_SHIFT as usize;
> +/// The number of bytes in a page.
> +pub const PAGE_SIZE: usize = 1 << PAGE_SHIFT;
> +/// A bitwise mask for the page size.
> +pub const PAGE_MASK: usize = PAGE_SIZE - 1;
> +
> +/// A pointer to a page that owns the page allocation.
> +///
> +/// # Invariants
> +///
> +/// The pointer points at a page, and has ownership over the page.
> +pub struct Page {
> +    page: NonNull<bindings::page>,
> +}

Shouldn't this be UnsafeCell / Opaque? Since `struct page` contains locks.

> +// SAFETY: It is safe to transfer page allocations between threads.
> +unsafe impl Send for Page {}
> +
> +// SAFETY: Calling `&self` methods on this type in parallel is safe. It might
> +// allow you to perform a data race on bytes stored in the page, but we treat
> +// this like data races on user pointers.
> +unsafe impl Sync for Page {}

These races should probably be in the Page docs, rather than pointing
to user pointers.

> +impl Page {
> +    /// Allocates a new set of contiguous pages.

"set of contiguous page" -> "page"?

> +    pub fn new() -> Result<Self, AllocError> {
> +        // SAFETY: These are the correct arguments to allocate a single page.
> +        let page = unsafe {
> +            bindings::alloc_pages(
> +                bindings::GFP_KERNEL | bindings::__GFP_ZERO | bindings::__GFP_HIGHMEM,
> +                0,
> +            )
> +        };
> +
> +        match NonNull::new(page) {
> +            // INVARIANT: We checked that the allocation above succeeded.
> +            Some(page) => Ok(Self { page }),
> +            None => Err(AllocError),
> +        }

Optionally:

    let page = NonNull::new(page).ok_or(AllocError)?;
    Ok(Self { page })

> +    }
> +
> +    /// Returns a raw pointer to the page.

Maybe add ", valid for PAGE_SIZE" or similar to make this obvious.

> +    pub fn as_ptr(&self) -> *mut bindings::page {
> +        self.page.as_ptr()
> +    }
> +
> +    /// Runs a piece of code with this page mapped to an address.

Maybe ", then immediately unmaps the page" to make the entire operation clear.

> +    /// It is up to the caller to use the provided raw pointer correctly.
> +    pub fn with_page_mapped<T>(&self, f: impl FnOnce(*mut c_void) -> T) -> T {

If there is exclusive access into the page, this signature could be:

    FnOnce(&mut [u8; PAGE_SIZE]) -> T

Otherwise possibly

    FnOnce(*mut [u8; PAGE_SIZE]) -> T

But based on the thread with Boqun it seems there is no synchronized
access here. In this case, "use the provided raw pointer correctly" or
the type level docs should clarify what you can and can't rely on with
pointers into a page.

E.g. if I'm understanding correctly, you can never construct a &T or
&mut T anywhere in this page unless T is Sync.

> +        // SAFETY: `page` is valid due to the type invariants on `Page`.
> +        let mapped_addr = unsafe { bindings::kmap_local_page(self.as_ptr()) };
> +
> +        let res = f(mapped_addr);
> +
> +        // SAFETY: This unmaps the page mapped above.
> +        //
> +        // Since this API takes the user code as a closure, it can only be used
> +        // in a manner where the pages are unmapped in reverse order. This is as
> +        // required by `kunmap_local`.
> +        //
> +        // In other words, if this call to `kunmap_local` happens when a
> +        // different page should be unmapped first, then there must necessarily
> +        // be a call to `kmap_local_page` other than the call just above in
> +        // `with_page_mapped` that made that possible. In this case, it is the
> +        // unsafe block that wraps that other call that is incorrect.
> +        unsafe { bindings::kunmap_local(mapped_addr) };
> +
> +        res
> +    }
> +
> +    /// Runs a piece of code with a raw pointer to a slice of this page, with
> +    /// bounds checking.
> +    ///
> +    /// If `f` is called, then it will be called with a pointer that points at
> +    /// `off` bytes into the page, and the pointer will be valid for at least
> +    /// `len` bytes. The pointer is only valid on this task, as this method uses
> +    /// a local mapping./
> +    ///
> +    /// If `off` and `len` refers to a region outside of this page, then this
> +    /// method returns `EINVAL` and does not call `f`.
> +    pub fn with_pointer_into_page<T>(
> +        &self,
> +        off: usize,
> +        len: usize,
> +        f: impl FnOnce(*mut u8) -> Result<T>,
> +    ) -> Result<T> {

Same question about exclusive access

    impl FnOnce(&mut [u8]) -> Result<T>

> +        let bounds_ok = off <= PAGE_SIZE && len <= PAGE_SIZE && (off + len) <= PAGE_SIZE;
> +
> +        if bounds_ok {
> +            self.with_page_mapped(move |page_addr| {
> +                // SAFETY: The `off` integer is at most `PAGE_SIZE`, so this pointer offset will
> +                // result in a pointer that is in bounds or one off the end of the page.
> +                f(unsafe { page_addr.cast::<u8>().add(off) })
> +            })
> +        } else {
> +            Err(EINVAL)
> +        }
> +    }
> +
> +    /// Maps the page and reads from it into the given buffer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// Callers must ensure that `dest` is valid for writing `len` bytes.
> +    pub unsafe fn read(&self, dest: *mut u8, offset: usize, len: usize) -> Result {

Is there a reason not to use a slice just for a destination to read into?

> +        self.with_pointer_into_page(offset, len, move |from_ptr| {

Nit: do the names from_ptr/to_ptr come from existing binder? src/dst
seems more common (also dst vs. dest).

> +            // SAFETY: If `with_pointer_into_page` calls into this closure, then
> +            // it has performed a bounds check and guarantees that `from_ptr` is
> +            // valid for `len` bytes.
> +            unsafe { ptr::copy(from_ptr, dest, len) };
> +            Ok(())
> +        })
> +    }
> +
> +    /// Maps the page and writes into it from the given buffer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// Callers must ensure that `src` is valid for reading `len` bytes.
> +    pub unsafe fn write(&self, src: *const u8, offset: usize, len: usize) -> Result {

Same slice question

> +        self.with_pointer_into_page(offset, len, move |to_ptr| {
> +            // SAFETY: If `with_pointer_into_page` calls into this closure, then
> +            // it has performed a bounds check and guarantees that `to_ptr` is
> +            // valid for `len` bytes.
> +            unsafe { ptr::copy(src, to_ptr, len) };
> +            Ok(())
> +        })
> +    }
> +
> +    /// Maps the page and zeroes the given slice.

Mention that this will error with the same conditions as with_pointer_into_page.

> +    pub fn fill_zero(&self, offset: usize, len: usize) -> Result {
> +        self.with_pointer_into_page(offset, len, move |to_ptr| {
> +            // SAFETY: If `with_pointer_into_page` calls into this closure, then
> +            // it has performed a bounds check and guarantees that `to_ptr` is
> +            // valid for `len` bytes.
> +            unsafe { ptr::write_bytes(to_ptr, 0u8, len) };
> +            Ok(())
> +        })
> +    }
> +
> +    /// Copies data from userspace into this page.
> +    pub fn copy_into_page(
> +        &self,
> +        reader: &mut UserSlicePtrReader,
> +        offset: usize,
> +        len: usize,
> +    ) -> Result {

Maybe copy_from_user_slice or something that includes "user", since
as-is it sounds like copying a page into another page.

Also, docs should point out the error condition.

> +        self.with_pointer_into_page(offset, len, move |to_ptr| {
> +            // SAFETY: If `with_pointer_into_page` calls into this closure, then
> +            // it has performed a bounds check and guarantees that `to_ptr` is
> +            // valid for `len` bytes.
> +            unsafe { reader.read_raw(to_ptr, len) }
> +        })
> +    }
> +}
> +
> +impl Drop for Page {
> +    fn drop(&mut self) {
> +        // SAFETY: By the type invariants, we have ownership of the page and can
> +        // free it.
> +        unsafe { bindings::__free_pages(self.page.as_ptr(), 0) };
> +    }
> +}
>
> --
> 2.43.0.429.g432eaa2c6b-goog
>
>

- Trevor
Trevor Gross Feb. 1, 2024, 6:50 a.m. UTC | #12
On Fri, Jan 26, 2024 at 1:28 PM Boqun Feng <boqun.feng@gmail.com> wrote:
>
> On Fri, Jan 26, 2024 at 01:33:46PM +0100, Alice Ryhl wrote:
> > On Fri, Jan 26, 2024 at 1:47 AM Boqun Feng <boqun.feng@gmail.com> wrote:
> > >
> > > On Wed, Jan 24, 2024 at 11:20:23AM +0000, Alice Ryhl wrote:
> > > > [...]
> > > > +    /// Maps the page and writes into it from the given buffer.
> > > > +    ///
> > > > +    /// # Safety
> > > > +    ///
> > > > +    /// Callers must ensure that `src` is valid for reading `len` bytes.
> > > > +    pub unsafe fn write(&self, src: *const u8, offset: usize, len: usize) -> Result {
> > >
> > > Use a slice like type as `src` maybe? Then the function can be safe:
> > >
> > >         pub fn write<S: AsRef<[u8]>>(&self, src: S, offset: usize) -> Result
> > >
> > > Besides, since `Page` impl `Sync`, shouldn't this `write` and the
> > > `fill_zero` be a `&mut self` function? Or make them both `unsafe`
> > > because of potential race and add some safety requirement?
> >
> > Ideally, we don't want data races with these methods to be UB. They
>
> I understand that, but in the current code, you can write:
>
>         CPU 0                   CPU 1
>         =====                   =====
>
>         page.write(src1, 0, 8);
>                                 page.write(src2, 0, 8);
>
> and it's a data race at kernel end. So my question is more how we can
> prevent the UB ;-)

Hm. Would the following work?

    // Change existing functions to work with references, meaning they need an
    // exclusive &mut self
    pub fn with_page_mapped<T>(
        &mut self,
        f: impl FnOnce(&mut [u8; PAGE_SIZE]) -> T
    ) -> T

    pub fn with_pointer_into_page<T>(
        &mut self,
        off: usize,
        len: usize,
        f: impl FnOnce(&mut [u8]) -> Result<T>,
    ) -> Result<T>

    // writing methods now take &mut self
    pub fn write(&mut self ...)
    pub fn fill_zero(&mut self ...)
    pub fn copy_into_page(&mut self ...)

    // Add two new functions that take &self, but return shared access
    pub fn with_page_mapped_raw<T>(
        &self,
        f: impl FnOnce(&UnsafeCell<[u8; PAGE_SIZE]>) -> T
    ) -> T

    pub fn with_pointer_into_page_raw<T>(
        &self,
        off: usize,
        len: usize,
        f: impl FnOnce(&[UnsafeCell<u8>]) -> Result<T>,
    ) -> Result<T>

This would mean that anyone who can obey rust's mutability rules can
use a page without any safety or race conditions to worry about, much
better for usability.

But if you do need to allow the data to be shared and racy, such as
the userspace example, the `_raw` methods allow for that and you can
`.get()` a `*mut u8` from the UnsafeCell. This moves the interior
mutability only to the mapped data rather than the Page itself, which
I think is more accurate anyway.

Leveraging UnsafeCell would also make some things with UserSlicePtr
more clear too.

- Trevor

> Regards,
> Boqun
>
> > could be mapped into the address space of a userspace process.
> >
> > Alice
>
Boqun Feng Feb. 5, 2024, 5:23 p.m. UTC | #13
On Thu, Feb 01, 2024 at 01:50:53AM -0500, Trevor Gross wrote:
> On Fri, Jan 26, 2024 at 1:28 PM Boqun Feng <boqun.feng@gmail.com> wrote:
> >
> > On Fri, Jan 26, 2024 at 01:33:46PM +0100, Alice Ryhl wrote:
> > > On Fri, Jan 26, 2024 at 1:47 AM Boqun Feng <boqun.feng@gmail.com> wrote:
> > > >
> > > > On Wed, Jan 24, 2024 at 11:20:23AM +0000, Alice Ryhl wrote:
> > > > > [...]
> > > > > +    /// Maps the page and writes into it from the given buffer.
> > > > > +    ///
> > > > > +    /// # Safety
> > > > > +    ///
> > > > > +    /// Callers must ensure that `src` is valid for reading `len` bytes.
> > > > > +    pub unsafe fn write(&self, src: *const u8, offset: usize, len: usize) -> Result {
> > > >
> > > > Use a slice like type as `src` maybe? Then the function can be safe:
> > > >
> > > >         pub fn write<S: AsRef<[u8]>>(&self, src: S, offset: usize) -> Result
> > > >
> > > > Besides, since `Page` impl `Sync`, shouldn't this `write` and the
> > > > `fill_zero` be a `&mut self` function? Or make them both `unsafe`
> > > > because of potential race and add some safety requirement?
> > >
> > > Ideally, we don't want data races with these methods to be UB. They
> >
> > I understand that, but in the current code, you can write:
> >
> >         CPU 0                   CPU 1
> >         =====                   =====
> >
> >         page.write(src1, 0, 8);
> >                                 page.write(src2, 0, 8);
> >
> > and it's a data race at kernel end. So my question is more how we can
> > prevent the UB ;-)
> 
> Hm. Would the following work?
> 
>     // Change existing functions to work with references, meaning they need an
>     // exclusive &mut self
>     pub fn with_page_mapped<T>(
>         &mut self,
>         f: impl FnOnce(&mut [u8; PAGE_SIZE]) -> T
>     ) -> T
> 
>     pub fn with_pointer_into_page<T>(
>         &mut self,
>         off: usize,
>         len: usize,
>         f: impl FnOnce(&mut [u8]) -> Result<T>,
>     ) -> Result<T>
> 
>     // writing methods now take &mut self
>     pub fn write(&mut self ...)
>     pub fn fill_zero(&mut self ...)
>     pub fn copy_into_page(&mut self ...)
> 
>     // Add two new functions that take &self, but return shared access
>     pub fn with_page_mapped_raw<T>(
>         &self,
>         f: impl FnOnce(&UnsafeCell<[u8; PAGE_SIZE]>) -> T
>     ) -> T
> 
>     pub fn with_pointer_into_page_raw<T>(
>         &self,
>         off: usize,
>         len: usize,
>         f: impl FnOnce(&[UnsafeCell<u8>]) -> Result<T>,
>     ) -> Result<T>
> 
> This would mean that anyone who can obey rust's mutability rules can
> use a page without any safety or race conditions to worry about, much
> better for usability.
> 
> But if you do need to allow the data to be shared and racy, such as
> the userspace example, the `_raw` methods allow for that and you can
> `.get()` a `*mut u8` from the UnsafeCell. This moves the interior
> mutability only to the mapped data rather than the Page itself, which
> I think is more accurate anyway.
> 

Looks good to me ;-) Thanks!

Regards,
Boqun

> Leveraging UnsafeCell would also make some things with UserSlicePtr
> more clear too.
> 
> - Trevor
> 
> > Regards,
> > Boqun
> >
> > > could be mapped into the address space of a userspace process.
> > >
> > > Alice
> >
>
Alice Ryhl Feb. 8, 2024, 1:36 p.m. UTC | #14
On Thu, Feb 1, 2024 at 7:51 AM Trevor Gross <tmgross@umich.edu> wrote:
>
> On Fri, Jan 26, 2024 at 1:28 PM Boqun Feng <boqun.feng@gmail.com> wrote:
> >
> > On Fri, Jan 26, 2024 at 01:33:46PM +0100, Alice Ryhl wrote:
> > > On Fri, Jan 26, 2024 at 1:47 AM Boqun Feng <boqun.feng@gmail.com> wrote:
> > > >
> > > > On Wed, Jan 24, 2024 at 11:20:23AM +0000, Alice Ryhl wrote:
> > > > > [...]
> > > > > +    /// Maps the page and writes into it from the given buffer.
> > > > > +    ///
> > > > > +    /// # Safety
> > > > > +    ///
> > > > > +    /// Callers must ensure that `src` is valid for reading `len` bytes.
> > > > > +    pub unsafe fn write(&self, src: *const u8, offset: usize, len: usize) -> Result {
> > > >
> > > > Use a slice like type as `src` maybe? Then the function can be safe:
> > > >
> > > >         pub fn write<S: AsRef<[u8]>>(&self, src: S, offset: usize) -> Result
> > > >
> > > > Besides, since `Page` impl `Sync`, shouldn't this `write` and the
> > > > `fill_zero` be a `&mut self` function? Or make them both `unsafe`
> > > > because of potential race and add some safety requirement?
> > >
> > > Ideally, we don't want data races with these methods to be UB. They
> >
> > I understand that, but in the current code, you can write:
> >
> >         CPU 0                   CPU 1
> >         =====                   =====
> >
> >         page.write(src1, 0, 8);
> >                                 page.write(src2, 0, 8);
> >
> > and it's a data race at kernel end. So my question is more how we can
> > prevent the UB ;-)
>
> Hm. Would the following work?
>
>     // Change existing functions to work with references, meaning they need an
>     // exclusive &mut self
>     pub fn with_page_mapped<T>(
>         &mut self,
>         f: impl FnOnce(&mut [u8; PAGE_SIZE]) -> T
>     ) -> T
>
>     pub fn with_pointer_into_page<T>(
>         &mut self,
>         off: usize,
>         len: usize,
>         f: impl FnOnce(&mut [u8]) -> Result<T>,
>     ) -> Result<T>
>
>     // writing methods now take &mut self
>     pub fn write(&mut self ...)
>     pub fn fill_zero(&mut self ...)
>     pub fn copy_into_page(&mut self ...)
>
>     // Add two new functions that take &self, but return shared access
>     pub fn with_page_mapped_raw<T>(
>         &self,
>         f: impl FnOnce(&UnsafeCell<[u8; PAGE_SIZE]>) -> T
>     ) -> T
>
>     pub fn with_pointer_into_page_raw<T>(
>         &self,
>         off: usize,
>         len: usize,
>         f: impl FnOnce(&[UnsafeCell<u8>]) -> Result<T>,
>     ) -> Result<T>
>
> This would mean that anyone who can obey rust's mutability rules can
> use a page without any safety or race conditions to worry about, much
> better for usability.

The methods can't be `&mut self` because I need the ability to perform
concurrent writes to disjoint subsets of the page.
Alice Ryhl Feb. 8, 2024, 1:46 p.m. UTC | #15
On Thu, Feb 1, 2024 at 7:02 AM Trevor Gross <tmgross@umich.edu> wrote:
>
> On Wed, Jan 24, 2024 at 6:22 AM Alice Ryhl <aliceryhl@google.com> wrote:
> > +/// A pointer to a page that owns the page allocation.
> > +///
> > +/// # Invariants
> > +///
> > +/// The pointer points at a page, and has ownership over the page.
> > +pub struct Page {
> > +    page: NonNull<bindings::page>,
> > +}
>
> Shouldn't this be UnsafeCell / Opaque? Since `struct page` contains locks.

That only matters when we use a reference. Here, it's behind a raw pointer.

> > +// SAFETY: It is safe to transfer page allocations between threads.
> > +unsafe impl Send for Page {}
> > +
> > +// SAFETY: Calling `&self` methods on this type in parallel is safe. It might
> > +// allow you to perform a data race on bytes stored in the page, but we treat
> > +// this like data races on user pointers.
> > +unsafe impl Sync for Page {}
>
> These races should probably be in the Page docs, rather than pointing
> to user pointers.

New safety comment:

SAFETY: As long as the safety requirements for `&self` methods on this
type are followed, there is no problem with calling them in parallel.

> > +impl Page {
> > +    /// Allocates a new set of contiguous pages.
>
> "set of contiguous page" -> "page"?

Thanks, done.

> > +    pub fn new() -> Result<Self, AllocError> {
> > +        // SAFETY: These are the correct arguments to allocate a single page.
> > +        let page = unsafe {
> > +            bindings::alloc_pages(
> > +                bindings::GFP_KERNEL | bindings::__GFP_ZERO | bindings::__GFP_HIGHMEM,
> > +                0,
> > +            )
> > +        };
> > +
> > +        match NonNull::new(page) {
> > +            // INVARIANT: We checked that the allocation above succeeded.
> > +            Some(page) => Ok(Self { page }),
> > +            None => Err(AllocError),
> > +        }
>
> Optionally:
>
>     let page = NonNull::new(page).ok_or(AllocError)?;
>     Ok(Self { page })

Done.

> > +    }
> > +
> > +    /// Returns a raw pointer to the page.
>
> Maybe add ", valid for PAGE_SIZE" or similar to make this obvious.

This is a pointer to the `struct page`, not the actual page data.

> > +    pub fn as_ptr(&self) -> *mut bindings::page {
> > +        self.page.as_ptr()
> > +    }
> > +
> > +    /// Runs a piece of code with this page mapped to an address.
>
> Maybe ", then immediately unmaps the page" to make the entire operation clear.

Ok.

> > +    /// It is up to the caller to use the provided raw pointer correctly.
> > +    pub fn with_page_mapped<T>(&self, f: impl FnOnce(*mut c_void) -> T) -> T {
>
> If there is exclusive access into the page, this signature could be:
>
>     FnOnce(&mut [u8; PAGE_SIZE]) -> T
>
> Otherwise possibly
>
>     FnOnce(*mut [u8; PAGE_SIZE]) -> T
>
> But based on the thread with Boqun it seems there is no synchronized
> access here. In this case, "use the provided raw pointer correctly" or
> the type level docs should clarify what you can and can't rely on with
> pointers into a page.
>
> E.g. if I'm understanding correctly, you can never construct a &T or
> &mut T anywhere in this page unless T is Sync.

We discussed this in the meeting and concluded that we should use *mut u8 here.

> > +    /// Runs a piece of code with a raw pointer to a slice of this page, with
> > +    /// bounds checking.
> > +    ///
> > +    /// If `f` is called, then it will be called with a pointer that points at
> > +    /// `off` bytes into the page, and the pointer will be valid for at least
> > +    /// `len` bytes. The pointer is only valid on this task, as this method uses
> > +    /// a local mapping./
> > +    ///
> > +    /// If `off` and `len` refers to a region outside of this page, then this
> > +    /// method returns `EINVAL` and does not call `f`.
> > +    pub fn with_pointer_into_page<T>(
> > +        &self,
> > +        off: usize,
> > +        len: usize,
> > +        f: impl FnOnce(*mut u8) -> Result<T>,
> > +    ) -> Result<T> {
>
> Same question about exclusive access
>
>     impl FnOnce(&mut [u8]) -> Result<T>

We discussed this in the meeting. Slices raise all sorts of cans of
worms with uninit and exclusivity, so the raw methods won't use them.

> > +        let bounds_ok = off <= PAGE_SIZE && len <= PAGE_SIZE && (off + len) <= PAGE_SIZE;
> > +
> > +        if bounds_ok {
> > +            self.with_page_mapped(move |page_addr| {
> > +                // SAFETY: The `off` integer is at most `PAGE_SIZE`, so this pointer offset will
> > +                // result in a pointer that is in bounds or one off the end of the page.
> > +                f(unsafe { page_addr.cast::<u8>().add(off) })
> > +            })
> > +        } else {
> > +            Err(EINVAL)
> > +        }
> > +    }
> > +
> > +    /// Maps the page and reads from it into the given buffer.
> > +    ///
> > +    /// # Safety
> > +    ///
> > +    /// Callers must ensure that `dest` is valid for writing `len` bytes.
> > +    pub unsafe fn read(&self, dest: *mut u8, offset: usize, len: usize) -> Result {
>
> Is there a reason not to use a slice just for a destination to read into?

Ditto.

> > +        self.with_pointer_into_page(offset, len, move |from_ptr| {
>
> Nit: do the names from_ptr/to_ptr come from existing binder? src/dst
> seems more common (also dst vs. dest).

Renamed everything to use src/dst

> > +        self.with_pointer_into_page(offset, len, move |to_ptr| {
> > +            // SAFETY: If `with_pointer_into_page` calls into this closure, then
> > +            // it has performed a bounds check and guarantees that `to_ptr` is
> > +            // valid for `len` bytes.
> > +            unsafe { ptr::copy(src, to_ptr, len) };
> > +            Ok(())
> > +        })
> > +    }
> > +
> > +    /// Maps the page and zeroes the given slice.
>
> Mention that this will error with the same conditions as with_pointer_into_page.

That method is private. I will add documentation for this that doesn't
reference with_pointer_into_page.

> > +    pub fn fill_zero(&self, offset: usize, len: usize) -> Result {
> > +        self.with_pointer_into_page(offset, len, move |to_ptr| {
> > +            // SAFETY: If `with_pointer_into_page` calls into this closure, then
> > +            // it has performed a bounds check and guarantees that `to_ptr` is
> > +            // valid for `len` bytes.
> > +            unsafe { ptr::write_bytes(to_ptr, 0u8, len) };
> > +            Ok(())
> > +        })
> > +    }
> > +
> > +    /// Copies data from userspace into this page.
> > +    pub fn copy_into_page(
> > +        &self,
> > +        reader: &mut UserSlicePtrReader,
> > +        offset: usize,
> > +        len: usize,
> > +    ) -> Result {
>
> Maybe copy_from_user_slice or something that includes "user", since
> as-is it sounds like copying a page into another page.
>
> Also, docs should point out the error condition.

Done.

Thanks,
Alice
Andreas Hindborg Feb. 8, 2024, 2:02 p.m. UTC | #16
Alice Ryhl <aliceryhl@google.com> writes:

> On Thu, Feb 1, 2024 at 7:02 AM Trevor Gross <tmgross@umich.edu> wrote:
>>
>> On Wed, Jan 24, 2024 at 6:22 AM Alice Ryhl <aliceryhl@google.com> wrote:
>> > +/// A pointer to a page that owns the page allocation.
>> > +///
>> > +/// # Invariants
>> > +///
>> > +/// The pointer points at a page, and has ownership over the page.
>> > +pub struct Page {
>> > +    page: NonNull<bindings::page>,
>> > +}
>>
>> Shouldn't this be UnsafeCell / Opaque? Since `struct page` contains locks.
>
> That only matters when we use a reference. Here, it's behind a raw pointer.

Why is it behind a pointer rather than being transparent over
`Opaque<bindings::page>` and using a `&Page` instead?

BR Andreas
Alice Ryhl Feb. 8, 2024, 2:12 p.m. UTC | #17
On Thu, Feb 8, 2024 at 3:02 PM Andreas Hindborg <a.hindborg@samsung.com> wrote:
>
>
> Alice Ryhl <aliceryhl@google.com> writes:
>
> > On Thu, Feb 1, 2024 at 7:02 AM Trevor Gross <tmgross@umich.edu> wrote:
> >>
> >> On Wed, Jan 24, 2024 at 6:22 AM Alice Ryhl <aliceryhl@google.com> wrote:
> >> > +/// A pointer to a page that owns the page allocation.
> >> > +///
> >> > +/// # Invariants
> >> > +///
> >> > +/// The pointer points at a page, and has ownership over the page.
> >> > +pub struct Page {
> >> > +    page: NonNull<bindings::page>,
> >> > +}
> >>
> >> Shouldn't this be UnsafeCell / Opaque? Since `struct page` contains locks.
> >
> > That only matters when we use a reference. Here, it's behind a raw pointer.
>
> Why is it behind a pointer rather than being transparent over
> `Opaque<bindings::page>` and using a `&Page` instead?

Because `&Page` would not have ownership of the page, but I need
ownership. We also can't use `ARef<Page>` because that has a `clone`
method.

One could introduce an Owned smart pointer and use `Owned<Page>`, but
I think that is out of scope for this patchset.

Alice
diff mbox series

Patch

diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helper.h
index c0cb4b05b918..7698f5b349d3 100644
--- a/rust/bindings/bindings_helper.h
+++ b/rust/bindings/bindings_helper.h
@@ -21,3 +21,4 @@ 
 const size_t RUST_CONST_HELPER_ARCH_SLAB_MINALIGN = ARCH_SLAB_MINALIGN;
 const gfp_t RUST_CONST_HELPER_GFP_KERNEL = GFP_KERNEL;
 const gfp_t RUST_CONST_HELPER___GFP_ZERO = __GFP_ZERO;
+const gfp_t RUST_CONST_HELPER___GFP_HIGHMEM = ___GFP_HIGHMEM;
diff --git a/rust/helpers.c b/rust/helpers.c
index 187f445fbf19..e6541119160b 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -25,6 +25,8 @@ 
 #include <linux/build_bug.h>
 #include <linux/err.h>
 #include <linux/errname.h>
+#include <linux/gfp.h>
+#include <linux/highmem.h>
 #include <linux/mutex.h>
 #include <linux/refcount.h>
 #include <linux/sched/signal.h>
@@ -127,6 +129,24 @@  int rust_helper_signal_pending(struct task_struct *t)
 }
 EXPORT_SYMBOL_GPL(rust_helper_signal_pending);
 
+struct page *rust_helper_alloc_pages(gfp_t gfp_mask, unsigned int order)
+{
+       return alloc_pages(gfp_mask, order);
+}
+EXPORT_SYMBOL_GPL(rust_helper_alloc_pages);
+
+void *rust_helper_kmap_local_page(struct page *page)
+{
+       return kmap_local_page(page);
+}
+EXPORT_SYMBOL_GPL(rust_helper_kmap_local_page);
+
+void rust_helper_kunmap_local(const void *addr)
+{
+       kunmap_local(addr);
+}
+EXPORT_SYMBOL_GPL(rust_helper_kunmap_local);
+
 refcount_t rust_helper_REFCOUNT_INIT(int n)
 {
 	return (refcount_t)REFCOUNT_INIT(n);
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 041233305fda..9f31faf88973 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -41,6 +41,7 @@ 
 pub mod kunit;
 #[cfg(CONFIG_NET)]
 pub mod net;
+pub mod page;
 pub mod prelude;
 pub mod print;
 mod static_assert;
diff --git a/rust/kernel/page.rs b/rust/kernel/page.rs
new file mode 100644
index 000000000000..f83c889d39e3
--- /dev/null
+++ b/rust/kernel/page.rs
@@ -0,0 +1,176 @@ 
+// SPDX-License-Identifier: GPL-2.0
+
+//! Kernel page allocation and management.
+
+use crate::{bindings, error::code::*, error::Result, user_ptr::UserSlicePtrReader};
+use core::{
+    alloc::AllocError,
+    ffi::c_void,
+    ptr::{self, NonNull},
+};
+
+/// A bitwise shift for the page size.
+pub const PAGE_SHIFT: usize = bindings::PAGE_SHIFT as usize;
+/// The number of bytes in a page.
+pub const PAGE_SIZE: usize = 1 << PAGE_SHIFT;
+/// A bitwise mask for the page size.
+pub const PAGE_MASK: usize = PAGE_SIZE - 1;
+
+/// A pointer to a page that owns the page allocation.
+///
+/// # Invariants
+///
+/// The pointer points at a page, and has ownership over the page.
+pub struct Page {
+    page: NonNull<bindings::page>,
+}
+
+// SAFETY: It is safe to transfer page allocations between threads.
+unsafe impl Send for Page {}
+
+// SAFETY: Calling `&self` methods on this type in parallel is safe. It might
+// allow you to perform a data race on bytes stored in the page, but we treat
+// this like data races on user pointers.
+unsafe impl Sync for Page {}
+
+impl Page {
+    /// Allocates a new set of contiguous pages.
+    pub fn new() -> Result<Self, AllocError> {
+        // SAFETY: These are the correct arguments to allocate a single page.
+        let page = unsafe {
+            bindings::alloc_pages(
+                bindings::GFP_KERNEL | bindings::__GFP_ZERO | bindings::__GFP_HIGHMEM,
+                0,
+            )
+        };
+
+        match NonNull::new(page) {
+            // INVARIANT: We checked that the allocation above succeeded.
+            Some(page) => Ok(Self { page }),
+            None => Err(AllocError),
+        }
+    }
+
+    /// Returns a raw pointer to the page.
+    pub fn as_ptr(&self) -> *mut bindings::page {
+        self.page.as_ptr()
+    }
+
+    /// Runs a piece of code with this page mapped to an address.
+    ///
+    /// It is up to the caller to use the provided raw pointer correctly.
+    pub fn with_page_mapped<T>(&self, f: impl FnOnce(*mut c_void) -> T) -> T {
+        // SAFETY: `page` is valid due to the type invariants on `Page`.
+        let mapped_addr = unsafe { bindings::kmap_local_page(self.as_ptr()) };
+
+        let res = f(mapped_addr);
+
+        // SAFETY: This unmaps the page mapped above.
+        //
+        // Since this API takes the user code as a closure, it can only be used
+        // in a manner where the pages are unmapped in reverse order. This is as
+        // required by `kunmap_local`.
+        //
+        // In other words, if this call to `kunmap_local` happens when a
+        // different page should be unmapped first, then there must necessarily
+        // be a call to `kmap_local_page` other than the call just above in
+        // `with_page_mapped` that made that possible. In this case, it is the
+        // unsafe block that wraps that other call that is incorrect.
+        unsafe { bindings::kunmap_local(mapped_addr) };
+
+        res
+    }
+
+    /// Runs a piece of code with a raw pointer to a slice of this page, with
+    /// bounds checking.
+    ///
+    /// If `f` is called, then it will be called with a pointer that points at
+    /// `off` bytes into the page, and the pointer will be valid for at least
+    /// `len` bytes. The pointer is only valid on this task, as this method uses
+    /// a local mapping.
+    ///
+    /// If `off` and `len` refers to a region outside of this page, then this
+    /// method returns `EINVAL` and does not call `f`.
+    pub fn with_pointer_into_page<T>(
+        &self,
+        off: usize,
+        len: usize,
+        f: impl FnOnce(*mut u8) -> Result<T>,
+    ) -> Result<T> {
+        let bounds_ok = off <= PAGE_SIZE && len <= PAGE_SIZE && (off + len) <= PAGE_SIZE;
+
+        if bounds_ok {
+            self.with_page_mapped(move |page_addr| {
+                // SAFETY: The `off` integer is at most `PAGE_SIZE`, so this pointer offset will
+                // result in a pointer that is in bounds or one off the end of the page.
+                f(unsafe { page_addr.cast::<u8>().add(off) })
+            })
+        } else {
+            Err(EINVAL)
+        }
+    }
+
+    /// Maps the page and reads from it into the given buffer.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that `dest` is valid for writing `len` bytes.
+    pub unsafe fn read(&self, dest: *mut u8, offset: usize, len: usize) -> Result {
+        self.with_pointer_into_page(offset, len, move |from_ptr| {
+            // SAFETY: If `with_pointer_into_page` calls into this closure, then
+            // it has performed a bounds check and guarantees that `from_ptr` is
+            // valid for `len` bytes.
+            unsafe { ptr::copy(from_ptr, dest, len) };
+            Ok(())
+        })
+    }
+
+    /// Maps the page and writes into it from the given buffer.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that `src` is valid for reading `len` bytes.
+    pub unsafe fn write(&self, src: *const u8, offset: usize, len: usize) -> Result {
+        self.with_pointer_into_page(offset, len, move |to_ptr| {
+            // SAFETY: If `with_pointer_into_page` calls into this closure, then
+            // it has performed a bounds check and guarantees that `to_ptr` is
+            // valid for `len` bytes.
+            unsafe { ptr::copy(src, to_ptr, len) };
+            Ok(())
+        })
+    }
+
+    /// Maps the page and zeroes the given slice.
+    pub fn fill_zero(&self, offset: usize, len: usize) -> Result {
+        self.with_pointer_into_page(offset, len, move |to_ptr| {
+            // SAFETY: If `with_pointer_into_page` calls into this closure, then
+            // it has performed a bounds check and guarantees that `to_ptr` is
+            // valid for `len` bytes.
+            unsafe { ptr::write_bytes(to_ptr, 0u8, len) };
+            Ok(())
+        })
+    }
+
+    /// Copies data from userspace into this page.
+    pub fn copy_into_page(
+        &self,
+        reader: &mut UserSlicePtrReader,
+        offset: usize,
+        len: usize,
+    ) -> Result {
+        self.with_pointer_into_page(offset, len, move |to_ptr| {
+            // SAFETY: If `with_pointer_into_page` calls into this closure, then
+            // it has performed a bounds check and guarantees that `to_ptr` is
+            // valid for `len` bytes.
+            unsafe { reader.read_raw(to_ptr, len) }
+        })
+    }
+}
+
+impl Drop for Page {
+    fn drop(&mut self) {
+        // SAFETY: By the type invariants, we have ownership of the page and can
+        // free it.
+        unsafe { bindings::__free_pages(self.page.as_ptr(), 0) };
+    }
+}