diff mbox series

[RFC,01/11] rust: add abstraction for struct device

Message ID 20240520172554.182094-2-dakr@redhat.com (mailing list archive)
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series Device / Driver and PCI Rust abstractions | expand

Commit Message

Danilo Krummrich May 20, 2024, 5:25 p.m. UTC
Add an (always) reference counted abstraction for a generic struct
device. This abstraction encapsulates existing struct device instances
and manages its reference count.

Subsystems may use this abstraction as a base to abstract subsystem
specific device instances based on a generic struct device.

Co-developed-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 rust/helpers.c        |  1 +
 rust/kernel/device.rs | 76 +++++++++++++++++++++++++++++++++++++++++++
 rust/kernel/lib.rs    |  1 +
 3 files changed, 78 insertions(+)
 create mode 100644 rust/kernel/device.rs

Comments

Greg KH May 20, 2024, 6 p.m. UTC | #1
On Mon, May 20, 2024 at 07:25:38PM +0200, Danilo Krummrich wrote:
> Add an (always) reference counted abstraction for a generic struct
> device. This abstraction encapsulates existing struct device instances
> and manages its reference count.
> 
> Subsystems may use this abstraction as a base to abstract subsystem
> specific device instances based on a generic struct device.
> 
> Co-developed-by: Wedson Almeida Filho <wedsonaf@gmail.com>
> Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
> Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> ---
>  rust/helpers.c        |  1 +
>  rust/kernel/device.rs | 76 +++++++++++++++++++++++++++++++++++++++++++

What's the status of moving .rs files next to their respective .c files
in the build system?  Keeping them separate like this just isn't going
to work, sorry.

> --- /dev/null
> +++ b/rust/kernel/device.rs
> @@ -0,0 +1,76 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Generic devices that are part of the kernel's driver model.
> +//!
> +//! C header: [`include/linux/device.h`](../../../../include/linux/device.h)

relative paths for a common "include <linux/device.h" type of thing?
Rust can't handle include paths from directories?

> +
> +use crate::{
> +    bindings,
> +    types::{ARef, Opaque},
> +};
> +use core::ptr;
> +
> +/// A ref-counted device.
> +///
> +/// # Invariants
> +///
> +/// The pointer stored in `Self` is non-null and valid for the lifetime of the ARef instance. In
> +/// particular, the ARef instance owns an increment on underlying object’s reference count.
> +#[repr(transparent)]
> +pub struct Device(Opaque<bindings::device>);
> +
> +impl Device {
> +    /// Creates a new ref-counted instance of an existing device pointer.
> +    ///
> +    /// # Safety
> +    ///
> +    /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count.

Callers NEVER care about the reference count of a struct device, anyone
poking in that is asking for trouble.

And why non-NULL?  Can't you check for that here?  Shouldn't you check
for that here?  Many driver core functions can handle a NULL pointer
just fine (i.e. get/put_device() can), why should Rust code assume that
a pointer passed to it from the C layer is going to have stricter rules
than the C layer can provide?

> +    pub unsafe fn from_raw(ptr: *mut bindings::device) -> ARef<Self> {
> +        // SAFETY: By the safety requirements, ptr is valid.
> +        // Initially increase the reference count by one to compensate for the final decrement once
> +        // this newly created `ARef<Device>` instance is dropped.
> +        unsafe { bindings::get_device(ptr) };
> +
> +        // CAST: `Self` is a `repr(transparent)` wrapper around `bindings::device`.
> +        let ptr = ptr.cast::<Self>();
> +
> +        // SAFETY: By the safety requirements, ptr is valid.
> +        unsafe { ARef::from_raw(ptr::NonNull::new_unchecked(ptr)) }
> +    }
> +
> +    /// Obtain the raw `struct device *`.
> +    pub(crate) fn as_raw(&self) -> *mut bindings::device {
> +        self.0.get()
> +    }
> +
> +    /// Convert a raw `struct device` pointer to a `&Device`.
> +    ///
> +    /// # Safety
> +    ///
> +    /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count for
> +    /// the entire duration when the returned reference exists.

Again, non-NULL might not be true, and reference counts are never
tracked by any user EXCEPT to increment/decrement it, you never know if
it is 0 or not, all you know is that if a pointer is given to you by the
driver core to a 'struct device' for a function that it is a valid
reference at that point in time, or maybe NULL, until your function
returns.  Anything after that can not be counted on.

thanks,

greg k-h
Miguel Ojeda May 20, 2024, 6:24 p.m. UTC | #2
On Mon, May 20, 2024 at 8:00 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> What's the status of moving .rs files next to their respective .c files
> in the build system?  Keeping them separate like this just isn't going
> to work, sorry.

In progress.

> > +//! Generic devices that are part of the kernel's driver model.
> > +//!
> > +//! C header: [`include/linux/device.h`](../../../../include/linux/device.h)
>
> relative paths for a common "include <linux/device.h" type of thing?
> Rust can't handle include paths from directories?

We have the custom `srctree/` notation for that (which the patch should use).

And eventually we should have support for easy-to-use links to C
structs and so on (i.e. to the C docs) -- today I happened to be
talking to the `rustdoc` maintainers about this old idea of ours.

Cheers,
Miguel
Danilo Krummrich May 20, 2024, 8:22 p.m. UTC | #3
On Mon, May 20, 2024 at 08:00:23PM +0200, Greg KH wrote:
> On Mon, May 20, 2024 at 07:25:38PM +0200, Danilo Krummrich wrote:
> > Add an (always) reference counted abstraction for a generic struct
> > device. This abstraction encapsulates existing struct device instances
> > and manages its reference count.
> > 
> > Subsystems may use this abstraction as a base to abstract subsystem
> > specific device instances based on a generic struct device.
> > 
> > Co-developed-by: Wedson Almeida Filho <wedsonaf@gmail.com>
> > Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
> > Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> > ---
> >  rust/helpers.c        |  1 +
> >  rust/kernel/device.rs | 76 +++++++++++++++++++++++++++++++++++++++++++
> 
> What's the status of moving .rs files next to their respective .c files
> in the build system?  Keeping them separate like this just isn't going
> to work, sorry.
> 
> > --- /dev/null
> > +++ b/rust/kernel/device.rs
> > @@ -0,0 +1,76 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +//! Generic devices that are part of the kernel's driver model.
> > +//!
> > +//! C header: [`include/linux/device.h`](../../../../include/linux/device.h)
> 
> relative paths for a common "include <linux/device.h" type of thing?
> Rust can't handle include paths from directories?

Going to change this to `srctree/` as proposed by Miguel.

> 
> > +
> > +use crate::{
> > +    bindings,
> > +    types::{ARef, Opaque},
> > +};
> > +use core::ptr;
> > +
> > +/// A ref-counted device.
> > +///
> > +/// # Invariants
> > +///
> > +/// The pointer stored in `Self` is non-null and valid for the lifetime of the ARef instance. In
> > +/// particular, the ARef instance owns an increment on underlying object’s reference count.
> > +#[repr(transparent)]
> > +pub struct Device(Opaque<bindings::device>);
> > +
> > +impl Device {
> > +    /// Creates a new ref-counted instance of an existing device pointer.
> > +    ///
> > +    /// # Safety
> > +    ///
> > +    /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count.
> 
> Callers NEVER care about the reference count of a struct device, anyone
> poking in that is asking for trouble.

That's confusing, if not the caller who's passing the device pointer somewhere,
who else?

Who takes care that a device' reference count is non-zero when a driver's probe
function is called?

It's the same here. The PCI code calls Device::from_raw() from its
probe_callback() function, which is called from the C side. For instance:

extern "C" fn probe_callback(
   pdev: *mut bindings::pci_dev,
   id: *const bindings::pci_device_id,
) -> core::ffi::c_int {
   // SAFETY: This is safe, since the C side guarantees that pdev is a valid,
   // non-null pointer to a struct pci_dev with a non-zero reference count.
   let dev = unsafe { device::Device::from_raw(&mut (*pdev).dev) };

   [...]
}

> 
> And why non-NULL?  Can't you check for that here?  Shouldn't you check
> for that here?  Many driver core functions can handle a NULL pointer
> just fine (i.e. get/put_device() can), why should Rust code assume that
> a pointer passed to it from the C layer is going to have stricter rules
> than the C layer can provide?

We could check for NULL here, but I think it'd be pointless. Even if the pointer
is not NULL, it can still be an invalid one. There is no check we can do to
guarantee safety, hence the function is and remains unsafe and has safety
requirements instead that the caller must guarantee to fulfil.

Like in the example above, probe_callback() can give those guarantees instead.

> 
> > +    pub unsafe fn from_raw(ptr: *mut bindings::device) -> ARef<Self> {
> > +        // SAFETY: By the safety requirements, ptr is valid.
> > +        // Initially increase the reference count by one to compensate for the final decrement once
> > +        // this newly created `ARef<Device>` instance is dropped.
> > +        unsafe { bindings::get_device(ptr) };
> > +
> > +        // CAST: `Self` is a `repr(transparent)` wrapper around `bindings::device`.
> > +        let ptr = ptr.cast::<Self>();
> > +
> > +        // SAFETY: By the safety requirements, ptr is valid.
> > +        unsafe { ARef::from_raw(ptr::NonNull::new_unchecked(ptr)) }
> > +    }
> > +
> > +    /// Obtain the raw `struct device *`.
> > +    pub(crate) fn as_raw(&self) -> *mut bindings::device {
> > +        self.0.get()
> > +    }
> > +
> > +    /// Convert a raw `struct device` pointer to a `&Device`.
> > +    ///
> > +    /// # Safety
> > +    ///
> > +    /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count for
> > +    /// the entire duration when the returned reference exists.
> 

This is the doc comment of pub unsafe fn as_ref<'a>(ptr: *mut bindings::device)
-> &'a Self. Let's keep this context, it's confusing for other readers
otherwise.

> Again, non-NULL might not be true, and reference counts are never

Like above, it's just the safety precondition. Checking for NULL does not
improve the situation, we still need to rely on the pointer being a valid one.

> tracked by any user EXCEPT to increment/decrement it, you never know if

That's the whole point, if one takes an increment on the reference count they
can guarantee it's non-zero.

> it is 0 or not, all you know is that if a pointer is given to you by the
> driver core to a 'struct device' for a function that it is a valid
> reference at that point in time, or maybe NULL, until your function
> returns.  Anything after that can not be counted on.

That's not contradictive to those safety comments. When Device::from_raw()
returns it has increased the reference count of the device by one. And when
Device::as_ref() returns the returned reference' lifetime is bound to the
lifetime of the pointer passed to it.

> 
> thanks,
> 
> greg k-h
>
Greg KH May 21, 2024, 9:24 a.m. UTC | #4
On Mon, May 20, 2024 at 10:22:22PM +0200, Danilo Krummrich wrote:
> On Mon, May 20, 2024 at 08:00:23PM +0200, Greg KH wrote:
> > On Mon, May 20, 2024 at 07:25:38PM +0200, Danilo Krummrich wrote:
> > > Add an (always) reference counted abstraction for a generic struct
> > > device. This abstraction encapsulates existing struct device instances
> > > and manages its reference count.
> > > 
> > > Subsystems may use this abstraction as a base to abstract subsystem
> > > specific device instances based on a generic struct device.
> > > 
> > > Co-developed-by: Wedson Almeida Filho <wedsonaf@gmail.com>
> > > Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
> > > Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> > > ---
> > >  rust/helpers.c        |  1 +
> > >  rust/kernel/device.rs | 76 +++++++++++++++++++++++++++++++++++++++++++
> > 
> > What's the status of moving .rs files next to their respective .c files
> > in the build system?  Keeping them separate like this just isn't going
> > to work, sorry.
> > 
> > > --- /dev/null
> > > +++ b/rust/kernel/device.rs
> > > @@ -0,0 +1,76 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +
> > > +//! Generic devices that are part of the kernel's driver model.
> > > +//!
> > > +//! C header: [`include/linux/device.h`](../../../../include/linux/device.h)
> > 
> > relative paths for a common "include <linux/device.h" type of thing?
> > Rust can't handle include paths from directories?
> 
> Going to change this to `srctree/` as proposed by Miguel.
> 
> > 
> > > +
> > > +use crate::{
> > > +    bindings,
> > > +    types::{ARef, Opaque},
> > > +};
> > > +use core::ptr;
> > > +
> > > +/// A ref-counted device.
> > > +///
> > > +/// # Invariants
> > > +///
> > > +/// The pointer stored in `Self` is non-null and valid for the lifetime of the ARef instance. In
> > > +/// particular, the ARef instance owns an increment on underlying object’s reference count.
> > > +#[repr(transparent)]
> > > +pub struct Device(Opaque<bindings::device>);
> > > +
> > > +impl Device {
> > > +    /// Creates a new ref-counted instance of an existing device pointer.
> > > +    ///
> > > +    /// # Safety
> > > +    ///
> > > +    /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count.
> > 
> > Callers NEVER care about the reference count of a struct device, anyone
> > poking in that is asking for trouble.
> 
> That's confusing, if not the caller who's passing the device pointer somewhere,
> who else?
> 
> Who takes care that a device' reference count is non-zero when a driver's probe
> function is called?

A device's reference count will be non-zero, I'm saying that sometimes,
some driver core functions are called with a 'struct device' that is
NULL, and it can handle it just fine.  Hopefully no callbacks to the
rust code will happen that way, but why aren't you checking just "to be
sure!" otherwise you could have a bug here, and it costs nothing to
verify it, right?

> It's the same here. The PCI code calls Device::from_raw() from its
> probe_callback() function, which is called from the C side. For instance:
> 
> extern "C" fn probe_callback(
>    pdev: *mut bindings::pci_dev,
>    id: *const bindings::pci_device_id,
> ) -> core::ffi::c_int {
>    // SAFETY: This is safe, since the C side guarantees that pdev is a valid,
>    // non-null pointer to a struct pci_dev with a non-zero reference count.
>    let dev = unsafe { device::Device::from_raw(&mut (*pdev).dev) };

Yes, that's fine, if you are calling this from a probe callback, but
again, the driver core has lots of functions in it :)

>    [...]
> }
> 
> > 
> > And why non-NULL?  Can't you check for that here?  Shouldn't you check
> > for that here?  Many driver core functions can handle a NULL pointer
> > just fine (i.e. get/put_device() can), why should Rust code assume that
> > a pointer passed to it from the C layer is going to have stricter rules
> > than the C layer can provide?
> 
> We could check for NULL here, but I think it'd be pointless. Even if the pointer
> is not NULL, it can still be an invalid one. There is no check we can do to
> guarantee safety, hence the function is and remains unsafe and has safety
> requirements instead that the caller must guarantee to fulfil.
> 
> Like in the example above, probe_callback() can give those guarantees instead.

Ok, if you say so, should we bookmark this thread for when this does
happen?  :)

What will the rust code do if it is passed in a NULL pointer?  Will it
crash like C code does?  Or something else?

thanks,

greg k-h
Danilo Krummrich May 21, 2024, 8:42 p.m. UTC | #5
On Tue, May 21, 2024 at 11:24:38AM +0200, Greg KH wrote:
> On Mon, May 20, 2024 at 10:22:22PM +0200, Danilo Krummrich wrote:
> > On Mon, May 20, 2024 at 08:00:23PM +0200, Greg KH wrote:
> > > On Mon, May 20, 2024 at 07:25:38PM +0200, Danilo Krummrich wrote:
> > > > Add an (always) reference counted abstraction for a generic struct
> > > > device. This abstraction encapsulates existing struct device instances
> > > > and manages its reference count.
> > > > 
> > > > Subsystems may use this abstraction as a base to abstract subsystem
> > > > specific device instances based on a generic struct device.
> > > > 
> > > > Co-developed-by: Wedson Almeida Filho <wedsonaf@gmail.com>
> > > > Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
> > > > Signed-off-by: Danilo Krummrich <dakr@redhat.com>
> > > > ---
> > > >  rust/helpers.c        |  1 +
> > > >  rust/kernel/device.rs | 76 +++++++++++++++++++++++++++++++++++++++++++
> > > 
> > > What's the status of moving .rs files next to their respective .c files
> > > in the build system?  Keeping them separate like this just isn't going
> > > to work, sorry.
> > > 
> > > > --- /dev/null
> > > > +++ b/rust/kernel/device.rs
> > > > @@ -0,0 +1,76 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +
> > > > +//! Generic devices that are part of the kernel's driver model.
> > > > +//!
> > > > +//! C header: [`include/linux/device.h`](../../../../include/linux/device.h)
> > > 
> > > relative paths for a common "include <linux/device.h" type of thing?
> > > Rust can't handle include paths from directories?
> > 
> > Going to change this to `srctree/` as proposed by Miguel.
> > 
> > > 
> > > > +
> > > > +use crate::{
> > > > +    bindings,
> > > > +    types::{ARef, Opaque},
> > > > +};
> > > > +use core::ptr;
> > > > +
> > > > +/// A ref-counted device.
> > > > +///
> > > > +/// # Invariants
> > > > +///
> > > > +/// The pointer stored in `Self` is non-null and valid for the lifetime of the ARef instance. In
> > > > +/// particular, the ARef instance owns an increment on underlying object’s reference count.
> > > > +#[repr(transparent)]
> > > > +pub struct Device(Opaque<bindings::device>);
> > > > +
> > > > +impl Device {
> > > > +    /// Creates a new ref-counted instance of an existing device pointer.
> > > > +    ///
> > > > +    /// # Safety
> > > > +    ///
> > > > +    /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count.
> > > 
> > > Callers NEVER care about the reference count of a struct device, anyone
> > > poking in that is asking for trouble.
> > 
> > That's confusing, if not the caller who's passing the device pointer somewhere,
> > who else?
> > 
> > Who takes care that a device' reference count is non-zero when a driver's probe
> > function is called?
> 
> A device's reference count will be non-zero, I'm saying that sometimes,
> some driver core functions are called with a 'struct device' that is
> NULL, and it can handle it just fine.  Hopefully no callbacks to the
> rust code will happen that way, but why aren't you checking just "to be
> sure!" otherwise you could have a bug here, and it costs nothing to
> verify it, right?

I get your point on that one. But let me explain a bit more why I think that
check is not overly helpful here.

In Rust we have the concept of marking functions as 'unsafe'. Unsafe functions
need to document their safety preconsitions, i.e. the conditions the caller of
the function must guarantee. The caller of an unsafe function needs an unsafe
block for it to compile and every unsafe block needs an explanation why it is
safe to call this function with the corresponding arguments.

(Ideally, we want to avoid having them in the first place, but for C abstractions
we have to deal with raw pointers we receive from the C side and dereferencing a
raw pointer is unsafe by definition.)

In this case we have a function that constructs the Rust `Device` structure from
a raw (device) pointer we potentially received from the C side. Now we have to
decide whether this function is going to be unsafe or safe.

In order for this function to be safe we would need to be able to guarantee that
this is a valid, non-null pointer with a non-zero reference count, which
unfortunately we can't. Hence, it's going to be an unsafe function.

A NULL pointer check would not make it a safe function either, since the pointer
could still be an invalid one, or a pointer to a device it's not guaranteed that
the reference count is held up for the duration of the function call.

Given that, we could add the NULL check and change the safety precondition to
"valid pointer to a device with non-zero reference count OR NULL", but I don't
see how this improves the situation for the caller, plus we'd need to return
`Result<Device>` instead and let the caller handle that the `Device` was not
created.

> 
> > It's the same here. The PCI code calls Device::from_raw() from its
> > probe_callback() function, which is called from the C side. For instance:
> > 
> > extern "C" fn probe_callback(
> >    pdev: *mut bindings::pci_dev,
> >    id: *const bindings::pci_device_id,
> > ) -> core::ffi::c_int {
> >    // SAFETY: This is safe, since the C side guarantees that pdev is a valid,
> >    // non-null pointer to a struct pci_dev with a non-zero reference count.
> >    let dev = unsafe { device::Device::from_raw(&mut (*pdev).dev) };
> 
> Yes, that's fine, if you are calling this from a probe callback, but
> again, the driver core has lots of functions in it :)

See explanation above.

> 
> >    [...]
> > }
> > 
> > > 
> > > And why non-NULL?  Can't you check for that here?  Shouldn't you check
> > > for that here?  Many driver core functions can handle a NULL pointer
> > > just fine (i.e. get/put_device() can), why should Rust code assume that
> > > a pointer passed to it from the C layer is going to have stricter rules
> > > than the C layer can provide?
> > 
> > We could check for NULL here, but I think it'd be pointless. Even if the pointer
> > is not NULL, it can still be an invalid one. There is no check we can do to
> > guarantee safety, hence the function is and remains unsafe and has safety
> > requirements instead that the caller must guarantee to fulfil.
> > 
> > Like in the example above, probe_callback() can give those guarantees instead.
> 
> Ok, if you say so, should we bookmark this thread for when this does
> happen?  :)

I'm just saying the caller has to validate that or provide a rationale why this
is safe anyways, hence it'd be just a duplicate check.

> 
> What will the rust code do if it is passed in a NULL pointer?  Will it
> crash like C code does?  Or something else?

It mostly calls into C functions with this pointer, depends on what they do.

Checking a few random places, e.g. [1], it seems to crash in most cases.

[1] https://elixir.free-electrons.com/linux/latest/source/drivers/base/core.c#L3863

> 
> thanks,
> 
> greg k-h
>
Greg KH June 4, 2024, 2:17 p.m. UTC | #6
On Tue, May 21, 2024 at 10:42:03PM +0200, Danilo Krummrich wrote:
> On Tue, May 21, 2024 at 11:24:38AM +0200, Greg KH wrote:
> > On Mon, May 20, 2024 at 10:22:22PM +0200, Danilo Krummrich wrote:
> > > > > +impl Device {
> > > > > +    /// Creates a new ref-counted instance of an existing device pointer.
> > > > > +    ///
> > > > > +    /// # Safety
> > > > > +    ///
> > > > > +    /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count.
> > > > 
> > > > Callers NEVER care about the reference count of a struct device, anyone
> > > > poking in that is asking for trouble.
> > > 
> > > That's confusing, if not the caller who's passing the device pointer somewhere,
> > > who else?
> > > 
> > > Who takes care that a device' reference count is non-zero when a driver's probe
> > > function is called?
> > 
> > A device's reference count will be non-zero, I'm saying that sometimes,
> > some driver core functions are called with a 'struct device' that is
> > NULL, and it can handle it just fine.  Hopefully no callbacks to the
> > rust code will happen that way, but why aren't you checking just "to be
> > sure!" otherwise you could have a bug here, and it costs nothing to
> > verify it, right?
> 
> I get your point on that one. But let me explain a bit more why I think that
> check is not overly helpful here.
> 
> In Rust we have the concept of marking functions as 'unsafe'. Unsafe functions
> need to document their safety preconsitions, i.e. the conditions the caller of
> the function must guarantee. The caller of an unsafe function needs an unsafe
> block for it to compile and every unsafe block needs an explanation why it is
> safe to call this function with the corresponding arguments.
> 
> (Ideally, we want to avoid having them in the first place, but for C abstractions
> we have to deal with raw pointers we receive from the C side and dereferencing a
> raw pointer is unsafe by definition.)
> 
> In this case we have a function that constructs the Rust `Device` structure from
> a raw (device) pointer we potentially received from the C side. Now we have to
> decide whether this function is going to be unsafe or safe.
> 
> In order for this function to be safe we would need to be able to guarantee that
> this is a valid, non-null pointer with a non-zero reference count, which
> unfortunately we can't. Hence, it's going to be an unsafe function.

But you can verify it is non-null, so why not?

> A NULL pointer check would not make it a safe function either, since the pointer
> could still be an invalid one, or a pointer to a device it's not guaranteed that
> the reference count is held up for the duration of the function call.

True, but you just took one huge swatch of "potential crashes" off the
table.  To ignore that feels odd.

> Given that, we could add the NULL check and change the safety precondition to
> "valid pointer to a device with non-zero reference count OR NULL", but I don't
> see how this improves the situation for the caller, plus we'd need to return
> `Result<Device>` instead and let the caller handle that the `Device` was not
> created.

It better be able to handle if `Device` was not created, as you could
have been out of memory and nothing would have been allocated.  To not
check feels very broken.

> > Ok, if you say so, should we bookmark this thread for when this does
> > happen?  :)
> 
> I'm just saying the caller has to validate that or provide a rationale why this
> is safe anyways, hence it'd be just a duplicate check.
> 
> > 
> > What will the rust code do if it is passed in a NULL pointer?  Will it
> > crash like C code does?  Or something else?
> 
> It mostly calls into C functions with this pointer, depends on what they do.
> 
> Checking a few random places, e.g. [1], it seems to crash in most cases.
> 
> [1] https://elixir.free-electrons.com/linux/latest/source/drivers/base/core.c#L3863

Great, then you should check :)

thanks,

greg k-h
Danilo Krummrich June 4, 2024, 4:23 p.m. UTC | #7
On Tue, Jun 04, 2024 at 04:17:29PM +0200, Greg KH wrote:
> On Tue, May 21, 2024 at 10:42:03PM +0200, Danilo Krummrich wrote:
> > On Tue, May 21, 2024 at 11:24:38AM +0200, Greg KH wrote:
> > > On Mon, May 20, 2024 at 10:22:22PM +0200, Danilo Krummrich wrote:
> > > > > > +impl Device {
> > > > > > +    /// Creates a new ref-counted instance of an existing device pointer.
> > > > > > +    ///
> > > > > > +    /// # Safety
> > > > > > +    ///
> > > > > > +    /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count.
> > > > > 
> > > > > Callers NEVER care about the reference count of a struct device, anyone
> > > > > poking in that is asking for trouble.
> > > > 
> > > > That's confusing, if not the caller who's passing the device pointer somewhere,
> > > > who else?
> > > > 
> > > > Who takes care that a device' reference count is non-zero when a driver's probe
> > > > function is called?
> > > 
> > > A device's reference count will be non-zero, I'm saying that sometimes,
> > > some driver core functions are called with a 'struct device' that is
> > > NULL, and it can handle it just fine.  Hopefully no callbacks to the
> > > rust code will happen that way, but why aren't you checking just "to be
> > > sure!" otherwise you could have a bug here, and it costs nothing to
> > > verify it, right?
> > 
> > I get your point on that one. But let me explain a bit more why I think that
> > check is not overly helpful here.
> > 
> > In Rust we have the concept of marking functions as 'unsafe'. Unsafe functions
> > need to document their safety preconsitions, i.e. the conditions the caller of
> > the function must guarantee. The caller of an unsafe function needs an unsafe
> > block for it to compile and every unsafe block needs an explanation why it is
> > safe to call this function with the corresponding arguments.
> > 
> > (Ideally, we want to avoid having them in the first place, but for C abstractions
> > we have to deal with raw pointers we receive from the C side and dereferencing a
> > raw pointer is unsafe by definition.)
> > 
> > In this case we have a function that constructs the Rust `Device` structure from
> > a raw (device) pointer we potentially received from the C side. Now we have to
> > decide whether this function is going to be unsafe or safe.
> > 
> > In order for this function to be safe we would need to be able to guarantee that
> > this is a valid, non-null pointer with a non-zero reference count, which
> > unfortunately we can't. Hence, it's going to be an unsafe function.
> 
> But you can verify it is non-null, so why not?

I suggest to check out the code making use of this.

From the PCI abstractions:

    extern "C" fn probe_callback(
        pdev: *mut bindings::pci_dev,
        id: *const bindings::pci_device_id,
    ) -> core::ffi::c_int {
        // SAFETY: Safe because the core kernel only ever calls the probe callback with a valid
        // `pdev`.
        let dev = unsafe { device::Device::from_raw(&mut (*pdev).dev) };

        [...]
    }

Doing the NULL check would turn this into something like:

    extern "C" fn probe_callback(
        pdev: *mut bindings::pci_dev,
        id: *const bindings::pci_device_id,
    ) -> core::ffi::c_int {
        // SAFETY: Safe because the core kernel only ever calls the probe callback with a valid
        // `pdev`, but we still have to handle `Device::from_raw`'s NULL check.
        let dev = match unsafe { device::Device::from_raw(&mut (*pdev).dev) } {
           Ok(dev) => dev,
           Err(err) => return Error::to_errno(err),
        }
    }

This would be super odd. If `Device::from_raw` reports "Ok" it actually wouldn't
mean everything is well. It would *only* mean that the pointer that was passed
is not NULL. This is counter intuitive; IMHO unsafe functions shouldn't return
any type of result, because it just isn't meaningful.

> 
> > A NULL pointer check would not make it a safe function either, since the pointer
> > could still be an invalid one, or a pointer to a device it's not guaranteed that
> > the reference count is held up for the duration of the function call.
> 
> True, but you just took one huge swatch of "potential crashes" off the
> table.  To ignore that feels odd.
> 
> > Given that, we could add the NULL check and change the safety precondition to
> > "valid pointer to a device with non-zero reference count OR NULL", but I don't
> > see how this improves the situation for the caller, plus we'd need to return
> > `Result<Device>` instead and let the caller handle that the `Device` was not
> > created.
> 
> It better be able to handle if `Device` was not created, as you could
> have been out of memory and nothing would have been allocated.  To not
> check feels very broken.

The abstraction is not allocating a new C struct device, it's just abstracting a
pointer to an existing struct device. There is no OOM case to handle, the
abstraction holding the pointer lives on the stack.

> 
> > > Ok, if you say so, should we bookmark this thread for when this does
> > > happen?  :)
> > 
> > I'm just saying the caller has to validate that or provide a rationale why this
> > is safe anyways, hence it'd be just a duplicate check.
> > 
> > > 
> > > What will the rust code do if it is passed in a NULL pointer?  Will it
> > > crash like C code does?  Or something else?
> > 
> > It mostly calls into C functions with this pointer, depends on what they do.
> > 
> > Checking a few random places, e.g. [1], it seems to crash in most cases.
> > 
> > [1] https://elixir.free-electrons.com/linux/latest/source/drivers/base/core.c#L3863
> 
> Great, then you should check :)

Why isn't the conclusion that the C code should check (as well)? :) Would you
want to add a NULL check at the beginning of device_del()?

In Rust we have a clear separation between safe and unsafe functions with, for
the latter, documented requirements on what's actually allowed to pass in and
which preconditions must be guaranteed. The check happens, just not within the
unsafe function.

> 
> thanks,
> 
> greg k-h
>
diff mbox series

Patch

diff --git a/rust/helpers.c b/rust/helpers.c
index 4c8b7b92a4f4..f9d2db1d1f33 100644
--- a/rust/helpers.c
+++ b/rust/helpers.c
@@ -23,6 +23,7 @@ 
 #include <kunit/test-bug.h>
 #include <linux/bug.h>
 #include <linux/build_bug.h>
+#include <linux/device.h>
 #include <linux/err.h>
 #include <linux/errname.h>
 #include <linux/mutex.h>
diff --git a/rust/kernel/device.rs b/rust/kernel/device.rs
new file mode 100644
index 000000000000..fafec70effb6
--- /dev/null
+++ b/rust/kernel/device.rs
@@ -0,0 +1,76 @@ 
+// SPDX-License-Identifier: GPL-2.0
+
+//! Generic devices that are part of the kernel's driver model.
+//!
+//! C header: [`include/linux/device.h`](../../../../include/linux/device.h)
+
+use crate::{
+    bindings,
+    types::{ARef, Opaque},
+};
+use core::ptr;
+
+/// A ref-counted device.
+///
+/// # Invariants
+///
+/// The pointer stored in `Self` is non-null and valid for the lifetime of the ARef instance. In
+/// particular, the ARef instance owns an increment on underlying object’s reference count.
+#[repr(transparent)]
+pub struct Device(Opaque<bindings::device>);
+
+impl Device {
+    /// Creates a new ref-counted instance of an existing device pointer.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count.
+    pub unsafe fn from_raw(ptr: *mut bindings::device) -> ARef<Self> {
+        // SAFETY: By the safety requirements, ptr is valid.
+        // Initially increase the reference count by one to compensate for the final decrement once
+        // this newly created `ARef<Device>` instance is dropped.
+        unsafe { bindings::get_device(ptr) };
+
+        // CAST: `Self` is a `repr(transparent)` wrapper around `bindings::device`.
+        let ptr = ptr.cast::<Self>();
+
+        // SAFETY: By the safety requirements, ptr is valid.
+        unsafe { ARef::from_raw(ptr::NonNull::new_unchecked(ptr)) }
+    }
+
+    /// Obtain the raw `struct device *`.
+    pub(crate) fn as_raw(&self) -> *mut bindings::device {
+        self.0.get()
+    }
+
+    /// Convert a raw `struct device` pointer to a `&Device`.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that `ptr` is valid, non-null, and has a non-zero reference count for
+    /// the entire duration when the returned reference exists.
+    pub unsafe fn as_ref<'a>(ptr: *mut bindings::device) -> &'a Self {
+        // SAFETY: Guaranteed by the safety requirements of the function.
+        unsafe { &*ptr.cast() }
+    }
+}
+
+// SAFETY: Instances of `Device` are always ref-counted.
+unsafe impl crate::types::AlwaysRefCounted for Device {
+    fn inc_ref(&self) {
+        // SAFETY: The existence of a shared reference guarantees that the refcount is nonzero.
+        unsafe { bindings::get_device(self.as_raw()) };
+    }
+
+    unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
+        // SAFETY: The safety requirements guarantee that the refcount is nonzero.
+        unsafe { bindings::put_device(obj.cast().as_ptr()) }
+    }
+}
+
+// SAFETY: `Device` only holds a pointer to a C device, which is safe to be used from any thread.
+unsafe impl Send for Device {}
+
+// SAFETY: `Device` only holds a pointer to a C device, references to which are safe to be used
+// from any thread.
+unsafe impl Sync for Device {}
diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs
index 9a943d99c71a..4ba3d4a49e9c 100644
--- a/rust/kernel/lib.rs
+++ b/rust/kernel/lib.rs
@@ -28,6 +28,7 @@ 
 
 pub mod alloc;
 mod build_assert;
+pub mod device;
 pub mod error;
 pub mod init;
 pub mod ioctl;