diff mbox series

[v6,02/16] chardev: introduce cdev_get_by_path()

Message ID 20190725172335.6825-3-logang@deltatee.com (mailing list archive)
State New, archived
Headers show
Series nvmet: add target passthru commands support | expand

Commit Message

Logan Gunthorpe July 25, 2019, 5:23 p.m. UTC
cdev_get_by_path() attempts to retrieve a struct cdev from
a path name. It is analagous to blkdev_get_by_path().

This will be necessary to create a nvme_ctrl_get_by_path()to
support NVMe-OF passthru.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
---
 fs/char_dev.c        | 34 ++++++++++++++++++++++++++++++++++
 include/linux/cdev.h |  1 +
 2 files changed, 35 insertions(+)

Comments

Greg Kroah-Hartman July 25, 2019, 5:40 p.m. UTC | #1
On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote:
> cdev_get_by_path() attempts to retrieve a struct cdev from
> a path name. It is analagous to blkdev_get_by_path().
> 
> This will be necessary to create a nvme_ctrl_get_by_path()to
> support NVMe-OF passthru.

Ick, why?  Why would a cdev have a "pathname"?

What is "NVMe-OF passthru"?  Why does a char device node have anything
to do with NVMe?

We have way too many ways to abuse cdevs today, my long-term-wish has
always been to clean this interface up to make it more sane and unified,
and get rid of the "outliers" (all created at the time for a good
reason, that's not the problem.)  But to add "just one more" seems
really odd to me.

thanks,

greg k-h
Logan Gunthorpe July 25, 2019, 5:53 p.m. UTC | #2
On 2019-07-25 11:40 a.m., Greg Kroah-Hartman wrote:
> On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote:
>> cdev_get_by_path() attempts to retrieve a struct cdev from
>> a path name. It is analagous to blkdev_get_by_path().
>>
>> This will be necessary to create a nvme_ctrl_get_by_path()to
>> support NVMe-OF passthru.
> 
> Ick, why?  Why would a cdev have a "pathname"?

So we can go from "/dev/nvme0" (which points to a char device) to its
struct cdev and eventually it's struct nvme_ctrl. Doing it this way also
allows supporting symlinks that might be created by udev rules.

This is very similar to blkdev_get_by_path() that lets regular NVMe-OF
obtain the struct block_device from a path.

I didn't think this would be all that controversial.

> What is "NVMe-OF passthru"?  Why does a char device node have anything
> to do with NVMe?

NVME-OF passthru is support for NVME over fabrics to directly target a
regular NVMe controller and thus export an entire NVMe device to a
remote system. We need to be able to tell the kernel which controller to
use and IMO a path to the device file is the best way as it allows us to
support symlinks created by udev.

> We have way too many ways to abuse cdevs today, my long-term-wish has
> always been to clean this interface up to make it more sane and unified,
> and get rid of the "outliers" (all created at the time for a good
> reason, that's not the problem.)  But to add "just one more" seems
> really odd to me.

Well it doesn't seem all that much like an outlier to me.

Logan
Matthew Wilcox (Oracle) July 25, 2019, 5:58 p.m. UTC | #3
On Thu, Jul 25, 2019 at 11:53:20AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 2019-07-25 11:40 a.m., Greg Kroah-Hartman wrote:
> > On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote:
> >> cdev_get_by_path() attempts to retrieve a struct cdev from
> >> a path name. It is analagous to blkdev_get_by_path().
> >>
> >> This will be necessary to create a nvme_ctrl_get_by_path()to
> >> support NVMe-OF passthru.
> > 
> > Ick, why?  Why would a cdev have a "pathname"?
> 
> So we can go from "/dev/nvme0" (which points to a char device) to its
> struct cdev and eventually it's struct nvme_ctrl. Doing it this way also
> allows supporting symlinks that might be created by udev rules.

But you're not really trying to go from a string to a chardev.  You're
trying to go from a nvmet_subsys to a chardev.  Isn't there a better
way to link the two somewhere else?

(I must confess that once I would have known the answer to this, but
the NVMe subsystem has grown ridiculously complex and I can no longer
fit it in my head)
Greg Kroah-Hartman July 25, 2019, 6:08 p.m. UTC | #4
On Thu, Jul 25, 2019 at 11:53:20AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 2019-07-25 11:40 a.m., Greg Kroah-Hartman wrote:
> > On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote:
> >> cdev_get_by_path() attempts to retrieve a struct cdev from
> >> a path name. It is analagous to blkdev_get_by_path().
> >>
> >> This will be necessary to create a nvme_ctrl_get_by_path()to
> >> support NVMe-OF passthru.
> > 
> > Ick, why?  Why would a cdev have a "pathname"?
> 
> So we can go from "/dev/nvme0" (which points to a char device) to its
> struct cdev and eventually it's struct nvme_ctrl. Doing it this way also
> allows supporting symlinks that might be created by udev rules.

Why do you have a "string" within the kernel and are not using the
normal open() call from userspace on the character device node on the
filesystem in your namespace/mount/whatever?

Where is this random string coming from?  Why is this so special that no
one else has ever needed it?

thanks,

greg k-h
Logan Gunthorpe July 25, 2019, 6:08 p.m. UTC | #5
On 2019-07-25 11:58 a.m., Matthew Wilcox wrote:
> On Thu, Jul 25, 2019 at 11:53:20AM -0600, Logan Gunthorpe wrote:
>>
>>
>> On 2019-07-25 11:40 a.m., Greg Kroah-Hartman wrote:
>>> On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote:
>>>> cdev_get_by_path() attempts to retrieve a struct cdev from
>>>> a path name. It is analagous to blkdev_get_by_path().
>>>>
>>>> This will be necessary to create a nvme_ctrl_get_by_path()to
>>>> support NVMe-OF passthru.
>>>
>>> Ick, why?  Why would a cdev have a "pathname"?
>>
>> So we can go from "/dev/nvme0" (which points to a char device) to its
>> struct cdev and eventually it's struct nvme_ctrl. Doing it this way also
>> allows supporting symlinks that might be created by udev rules.
> 
> But you're not really trying to go from a string to a chardev.  You're
> trying to go from a nvmet_subsys to a chardev.  Isn't there a better
> way to link the two somewhere else?
> 
> (I must confess that once I would have known the answer to this, but
> the NVMe subsystem has grown ridiculously complex and I can no longer
> fit it in my head)

Well the nvmet_subsys isn't related to the nvme_ctrl (and thus char dev)
at all. An nvmet_subsys is created via configfs and the user has to
specify an NVMe controller for it to use (by writting a string to a
config attribute). The best handle the user has is a path to the
controller's cdev (/dev/nvmeX) so the fabrics code has to be able to
lookup the corresponding struct nvme_ctrl from the path.

This is directly analogous to the way NVMe-of works today: it uses
blkdev_get_by_path() to translate a user provided path to a struct
block_device. The only difference here is that, for passthru, we need a
nvme_ctrl, not a block device.

Logan
Greg Kroah-Hartman July 25, 2019, 6:10 p.m. UTC | #6
On Thu, Jul 25, 2019 at 11:53:20AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 2019-07-25 11:40 a.m., Greg Kroah-Hartman wrote:
> > On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote:
> >> cdev_get_by_path() attempts to retrieve a struct cdev from
> >> a path name. It is analagous to blkdev_get_by_path().
> >>
> >> This will be necessary to create a nvme_ctrl_get_by_path()to
> >> support NVMe-OF passthru.
> > 
> > Ick, why?  Why would a cdev have a "pathname"?
> 
> So we can go from "/dev/nvme0" (which points to a char device) to its
> struct cdev and eventually it's struct nvme_ctrl. Doing it this way also
> allows supporting symlinks that might be created by udev rules.
> 
> This is very similar to blkdev_get_by_path() that lets regular NVMe-OF
> obtain the struct block_device from a path.
> 
> I didn't think this would be all that controversial.
> 
> > What is "NVMe-OF passthru"?  Why does a char device node have anything
> > to do with NVMe?
> 
> NVME-OF passthru is support for NVME over fabrics to directly target a
> regular NVMe controller and thus export an entire NVMe device to a
> remote system. We need to be able to tell the kernel which controller to
> use and IMO a path to the device file is the best way as it allows us to
> support symlinks created by udev.

open() in userspace handles symlinks just fine, what crazy interface
passes a string to try to find a char device node that is not open()?

And why do you need a char device at all anyway?  Is this just the
"normal" nvme controller's character device node?

> > We have way too many ways to abuse cdevs today, my long-term-wish has
> > always been to clean this interface up to make it more sane and unified,
> > and get rid of the "outliers" (all created at the time for a good
> > reason, that's not the problem.)  But to add "just one more" seems
> > really odd to me.
> 
> Well it doesn't seem all that much like an outlier to me.

Everyone is special, just like everyone else :)

Seriously, as no one else has ever needed this, you are an outlier.

thanks,

greg k-h
Logan Gunthorpe July 25, 2019, 6:14 p.m. UTC | #7
On 2019-07-25 12:08 p.m., Greg Kroah-Hartman wrote:
> On Thu, Jul 25, 2019 at 11:53:20AM -0600, Logan Gunthorpe wrote:
>>
>>
>> On 2019-07-25 11:40 a.m., Greg Kroah-Hartman wrote:
>>> On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote:
>>>> cdev_get_by_path() attempts to retrieve a struct cdev from
>>>> a path name. It is analagous to blkdev_get_by_path().
>>>>
>>>> This will be necessary to create a nvme_ctrl_get_by_path()to
>>>> support NVMe-OF passthru.
>>>
>>> Ick, why?  Why would a cdev have a "pathname"?
>>
>> So we can go from "/dev/nvme0" (which points to a char device) to its
>> struct cdev and eventually it's struct nvme_ctrl. Doing it this way also
>> allows supporting symlinks that might be created by udev rules.
> 
> Why do you have a "string" within the kernel and are not using the
> normal open() call from userspace on the character device node on the
> filesystem in your namespace/mount/whatever?

NVMe-OF is configured using configfs. The target is specified by the
user writing a path to a configfs attribute. This is the way it works
today but with blkdev_get_by_path()[1]. For the passthru code, we need
to get a nvme_ctrl instead of a block_device, but the principal is the same.

> Where is this random string coming from?  

configfs

> Why is this so special that no
> one else has ever needed it?

People have needed the same functionality for block devices and
blkdev_get_by_path() has multiple users (iscsi, drbd, nvme-of, etc)
which are doing similar things. Nobody has needed to do the same with a
chardev until we wanted the NVMe-of to support targeting an NVMe
controller which is represented in userspace by a char device.

Logan


[1]
https://elixir.bootlin.com/linux/latest/source/drivers/nvme/target/io-cmd-bdev.c#L15
Logan Gunthorpe July 25, 2019, 6:16 p.m. UTC | #8
On 2019-07-25 12:10 p.m., Greg Kroah-Hartman wrote:
> On Thu, Jul 25, 2019 at 11:53:20AM -0600, Logan Gunthorpe wrote:
>>
>>
>> On 2019-07-25 11:40 a.m., Greg Kroah-Hartman wrote:
>>> On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote:
>>>> cdev_get_by_path() attempts to retrieve a struct cdev from
>>>> a path name. It is analagous to blkdev_get_by_path().
>>>>
>>>> This will be necessary to create a nvme_ctrl_get_by_path()to
>>>> support NVMe-OF passthru.
>>>
>>> Ick, why?  Why would a cdev have a "pathname"?
>>
>> So we can go from "/dev/nvme0" (which points to a char device) to its
>> struct cdev and eventually it's struct nvme_ctrl. Doing it this way also
>> allows supporting symlinks that might be created by udev rules.
>>
>> This is very similar to blkdev_get_by_path() that lets regular NVMe-OF
>> obtain the struct block_device from a path.
>>
>> I didn't think this would be all that controversial.
>>
>>> What is "NVMe-OF passthru"?  Why does a char device node have anything
>>> to do with NVMe?
>>
>> NVME-OF passthru is support for NVME over fabrics to directly target a
>> regular NVMe controller and thus export an entire NVMe device to a
>> remote system. We need to be able to tell the kernel which controller to
>> use and IMO a path to the device file is the best way as it allows us to
>> support symlinks created by udev.
> 
> open() in userspace handles symlinks just fine, what crazy interface
> passes a string to try to find a char device node that is not open()?

configfs. Which I'm stuck with seeing nvme-of already uses that for
configuration and I don't think that's going to change...

> And why do you need a char device at all anyway?  Is this just the
> "normal" nvme controller's character device node?

Yes.

Logan
Greg Kroah-Hartman July 25, 2019, 6:27 p.m. UTC | #9
On Thu, Jul 25, 2019 at 12:14:33PM -0600, Logan Gunthorpe wrote:
> 
> 
> On 2019-07-25 12:08 p.m., Greg Kroah-Hartman wrote:
> > On Thu, Jul 25, 2019 at 11:53:20AM -0600, Logan Gunthorpe wrote:
> >>
> >>
> >> On 2019-07-25 11:40 a.m., Greg Kroah-Hartman wrote:
> >>> On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote:
> >>>> cdev_get_by_path() attempts to retrieve a struct cdev from
> >>>> a path name. It is analagous to blkdev_get_by_path().
> >>>>
> >>>> This will be necessary to create a nvme_ctrl_get_by_path()to
> >>>> support NVMe-OF passthru.
> >>>
> >>> Ick, why?  Why would a cdev have a "pathname"?
> >>
> >> So we can go from "/dev/nvme0" (which points to a char device) to its
> >> struct cdev and eventually it's struct nvme_ctrl. Doing it this way also
> >> allows supporting symlinks that might be created by udev rules.
> > 
> > Why do you have a "string" within the kernel and are not using the
> > normal open() call from userspace on the character device node on the
> > filesystem in your namespace/mount/whatever?
> 
> NVMe-OF is configured using configfs. The target is specified by the
> user writing a path to a configfs attribute. This is the way it works
> today but with blkdev_get_by_path()[1]. For the passthru code, we need
> to get a nvme_ctrl instead of a block_device, but the principal is the same.

Why isn't a fd being passed in there instead of a random string?

Seems odd, but oh well, that ship sailed a long time ago for block
devices I guess.

So what do you actually _do_ with that char device once you have it?

greg k-h
Logan Gunthorpe July 25, 2019, 6:36 p.m. UTC | #10
On 2019-07-25 12:27 p.m., Greg Kroah-Hartman wrote:
>>> Why do you have a "string" within the kernel and are not using the
>>> normal open() call from userspace on the character device node on the
>>> filesystem in your namespace/mount/whatever?
>>
>> NVMe-OF is configured using configfs. The target is specified by the
>> user writing a path to a configfs attribute. This is the way it works
>> today but with blkdev_get_by_path()[1]. For the passthru code, we need
>> to get a nvme_ctrl instead of a block_device, but the principal is the same.
> 
> Why isn't a fd being passed in there instead of a random string?

I wouldn't know the answer to this but I assume because once we decided
to use configfs, there was no way for the user to pass the kernel an fd.

> Seems odd, but oh well, that ship sailed a long time ago for block
> devices I guess.

Yup.

> So what do you actually _do_ with that char device once you have it?

We lookup the struct nvme_ctrl and use it to submit passed-through NVMe
commands directly to the controller.

Logan
Matthew Wilcox (Oracle) July 25, 2019, 7 p.m. UTC | #11
On Thu, Jul 25, 2019 at 08:27:01PM +0200, Greg Kroah-Hartman wrote:
> > NVMe-OF is configured using configfs. The target is specified by the
> > user writing a path to a configfs attribute. This is the way it works
> > today but with blkdev_get_by_path()[1]. For the passthru code, we need
> > to get a nvme_ctrl instead of a block_device, but the principal is the same.
> 
> Why isn't a fd being passed in there instead of a random string?

I suppose we could echo a string of the file descriptor number there,
and look up the fd in the process' file descriptor table ...

I'll get my coat.
Sagi Grimberg July 25, 2019, 7:02 p.m. UTC | #12
>>>> Why do you have a "string" within the kernel and are not using the
>>>> normal open() call from userspace on the character device node on the
>>>> filesystem in your namespace/mount/whatever?
>>>
>>> NVMe-OF is configured using configfs. The target is specified by the
>>> user writing a path to a configfs attribute. This is the way it works
>>> today but with blkdev_get_by_path()[1]. For the passthru code, we need
>>> to get a nvme_ctrl instead of a block_device, but the principal is the same.
>>
>> Why isn't a fd being passed in there instead of a random string?
> 
> I wouldn't know the answer to this but I assume because once we decided
> to use configfs, there was no way for the user to pass the kernel an fd.

That's definitely not changing. But this is not different than how we
use the block device or file configuration, this just happen to need the
nvme controller chardev now to issue I/O.
Sagi Grimberg July 25, 2019, 7:05 p.m. UTC | #13
>>> NVMe-OF is configured using configfs. The target is specified by the
>>> user writing a path to a configfs attribute. This is the way it works
>>> today but with blkdev_get_by_path()[1]. For the passthru code, we need
>>> to get a nvme_ctrl instead of a block_device, but the principal is the same.
>>
>> Why isn't a fd being passed in there instead of a random string?
> 
> I suppose we could echo a string of the file descriptor number there,
> and look up the fd in the process' file descriptor table ...

Assuming that there is a open handle somewhere out there...

> I'll get my coat.

Grab mine too..
Matthew Wilcox (Oracle) July 25, 2019, 7:11 p.m. UTC | #14
On Thu, Jul 25, 2019 at 12:05:29PM -0700, Sagi Grimberg wrote:
> 
> > > > NVMe-OF is configured using configfs. The target is specified by the
> > > > user writing a path to a configfs attribute. This is the way it works
> > > > today but with blkdev_get_by_path()[1]. For the passthru code, we need
> > > > to get a nvme_ctrl instead of a block_device, but the principal is the same.
> > > 
> > > Why isn't a fd being passed in there instead of a random string?
> > 
> > I suppose we could echo a string of the file descriptor number there,
> > and look up the fd in the process' file descriptor table ...
> 
> Assuming that there is a open handle somewhere out there...

Well, that's how we'd know that the application echoing /dev/nvme3 into
configfs actually has permission to access /dev/nvme3.  Think about
containers, for example.  It's not exactly safe to mount configfs in a
non-root container since it can access any NVMe device in the system,
not just ones which it's been given permission to access.  Right?
Logan Gunthorpe July 25, 2019, 7:24 p.m. UTC | #15
On 2019-07-25 1:11 p.m., Matthew Wilcox wrote:
> On Thu, Jul 25, 2019 at 12:05:29PM -0700, Sagi Grimberg wrote:
>>
>>>>> NVMe-OF is configured using configfs. The target is specified by the
>>>>> user writing a path to a configfs attribute. This is the way it works
>>>>> today but with blkdev_get_by_path()[1]. For the passthru code, we need
>>>>> to get a nvme_ctrl instead of a block_device, but the principal is the same.
>>>>
>>>> Why isn't a fd being passed in there instead of a random string?
>>>
>>> I suppose we could echo a string of the file descriptor number there,
>>> and look up the fd in the process' file descriptor table ...
>>
>> Assuming that there is a open handle somewhere out there...

Yes, that would be a step backwards from an interface. The user would
then need a special process to open the fd and pass it through configfs.
They couldn't just do it with basic bash commands.

> Well, that's how we'd know that the application echoing /dev/nvme3 into
> configfs actually has permission to access /dev/nvme3.  

It's the kernel that's accessing the device so it has permission. root
permission is required to configure the kernel.

> Think about
> containers, for example.  It's not exactly safe to mount configfs in a
> non-root container since it can access any NVMe device in the system,
> not just ones which it's been given permission to access.  Right?

I don't think it really makes any sense to talk about NVMe-of and
containers. Though, if we did it would be solely on the configuration
interface so that users inside a container might be able to configure a
new target for resources they can see and they'd have to have their own
view into configfs....

Logan
Matthew Wilcox (Oracle) July 25, 2019, 7:26 p.m. UTC | #16
On Thu, Jul 25, 2019 at 01:24:22PM -0600, Logan Gunthorpe wrote:
> >> Assuming that there is a open handle somewhere out there...
> 
> Yes, that would be a step backwards from an interface. The user would
> then need a special process to open the fd and pass it through configfs.
> They couldn't just do it with basic bash commands.

echo 3 3</dev/nvme3 >/configfs/foor/bar/whatever
Sagi Grimberg July 25, 2019, 7:31 p.m. UTC | #17
>>>>> NVMe-OF is configured using configfs. The target is specified by the
>>>>> user writing a path to a configfs attribute. This is the way it works
>>>>> today but with blkdev_get_by_path()[1]. For the passthru code, we need
>>>>> to get a nvme_ctrl instead of a block_device, but the principal is the same.
>>>>
>>>> Why isn't a fd being passed in there instead of a random string?
>>>
>>> I suppose we could echo a string of the file descriptor number there,
>>> and look up the fd in the process' file descriptor table ...
>>
>> Assuming that there is a open handle somewhere out there...
> 
> Well, that's how we'd know that the application echoing /dev/nvme3 into
> configfs actually has permission to access /dev/nvme3.

Actually, the application is exposing a target device to someone else,
its actually preferable that it doesn't have access to it as its
possibly can create a consistency hole, but that is usually a root
application anyways... We could verify at least that though..

>  Think about
> containers, for example.  It's not exactly safe to mount configfs in a
> non-root container since it can access any NVMe device in the system,
> not just ones which it's been given permission to access.  Right?

I'm seeing this as an equivalent to an application that is binding a
socket to an ip address, and the kernel looks-up according to the net
namespace that the socket has.

I do agree this is an area that was never really thought of. But what
you are describing requires infrastructure around it instead of forcing
the user to pass an fd to validate around it.
Logan Gunthorpe July 25, 2019, 7:31 p.m. UTC | #18
On 2019-07-25 1:26 p.m., Matthew Wilcox wrote:
> On Thu, Jul 25, 2019 at 01:24:22PM -0600, Logan Gunthorpe wrote:
>>>> Assuming that there is a open handle somewhere out there...
>>
>> Yes, that would be a step backwards from an interface. The user would
>> then need a special process to open the fd and pass it through configfs.
>> They couldn't just do it with basic bash commands.
> 
> echo 3 3</dev/nvme3 >/configfs/foor/bar/whatever

Neat. I didn't know you could do that.

Logan
Greg Kroah-Hartman July 25, 2019, 7:34 p.m. UTC | #19
On Thu, Jul 25, 2019 at 12:02:30PM -0700, Sagi Grimberg wrote:
> 
> > > > > Why do you have a "string" within the kernel and are not using the
> > > > > normal open() call from userspace on the character device node on the
> > > > > filesystem in your namespace/mount/whatever?
> > > > 
> > > > NVMe-OF is configured using configfs. The target is specified by the
> > > > user writing a path to a configfs attribute. This is the way it works
> > > > today but with blkdev_get_by_path()[1]. For the passthru code, we need
> > > > to get a nvme_ctrl instead of a block_device, but the principal is the same.
> > > 
> > > Why isn't a fd being passed in there instead of a random string?
> > 
> > I wouldn't know the answer to this but I assume because once we decided
> > to use configfs, there was no way for the user to pass the kernel an fd.
> 
> That's definitely not changing. But this is not different than how we
> use the block device or file configuration, this just happen to need the
> nvme controller chardev now to issue I/O.

So, as was kind of alluded to in another part of the thread, what are
you doing about permissions?  It seems that any user/group permissions
are out the window when you have the kernel itself do the opening of the
char device, right?  Why is that ok?  You can pass it _any_ character
device node and away it goes?  What if you give it a "wrong" one?  Char
devices are very different from block devices this way.

thanks,

greg k-h
Sagi Grimberg July 25, 2019, 7:37 p.m. UTC | #20
>>>>>> Why do you have a "string" within the kernel and are not using the
>>>>>> normal open() call from userspace on the character device node on the
>>>>>> filesystem in your namespace/mount/whatever?
>>>>>
>>>>> NVMe-OF is configured using configfs. The target is specified by the
>>>>> user writing a path to a configfs attribute. This is the way it works
>>>>> today but with blkdev_get_by_path()[1]. For the passthru code, we need
>>>>> to get a nvme_ctrl instead of a block_device, but the principal is the same.
>>>>
>>>> Why isn't a fd being passed in there instead of a random string?
>>>
>>> I wouldn't know the answer to this but I assume because once we decided
>>> to use configfs, there was no way for the user to pass the kernel an fd.
>>
>> That's definitely not changing. But this is not different than how we
>> use the block device or file configuration, this just happen to need the
>> nvme controller chardev now to issue I/O.
> 
> So, as was kind of alluded to in another part of the thread, what are
> you doing about permissions?  It seems that any user/group permissions
> are out the window when you have the kernel itself do the opening of the
> char device, right?  Why is that ok?  You can pass it _any_ character
> device node and away it goes?  What if you give it a "wrong" one?  Char
> devices are very different from block devices this way.

We could condition any configfs operation on capable(CAP_NET_ADMIN) to
close that hole for now..
Logan Gunthorpe July 25, 2019, 7:41 p.m. UTC | #21
On 2019-07-25 1:34 p.m., Greg Kroah-Hartman wrote:
> On Thu, Jul 25, 2019 at 12:02:30PM -0700, Sagi Grimberg wrote:
>>
>>>>>> Why do you have a "string" within the kernel and are not using the
>>>>>> normal open() call from userspace on the character device node on the
>>>>>> filesystem in your namespace/mount/whatever?
>>>>>
>>>>> NVMe-OF is configured using configfs. The target is specified by the
>>>>> user writing a path to a configfs attribute. This is the way it works
>>>>> today but with blkdev_get_by_path()[1]. For the passthru code, we need
>>>>> to get a nvme_ctrl instead of a block_device, but the principal is the same.
>>>>
>>>> Why isn't a fd being passed in there instead of a random string?
>>>
>>> I wouldn't know the answer to this but I assume because once we decided
>>> to use configfs, there was no way for the user to pass the kernel an fd.
>>
>> That's definitely not changing. But this is not different than how we
>> use the block device or file configuration, this just happen to need the
>> nvme controller chardev now to issue I/O.
> 
> So, as was kind of alluded to in another part of the thread, what are
> you doing about permissions?  It seems that any user/group permissions
> are out the window when you have the kernel itself do the opening of the
> char device, right?  Why is that ok?  You can pass it _any_ character
> device node and away it goes?  What if you give it a "wrong" one?  Char
> devices are very different from block devices this way.

Well the permission question is no different from the block-device case
we already have. The user has to be root to configure a target so it has
access to the block/char device. Containers and NVMe-of are really not
designed to mix and would take a lot of work to make this make any sense
(And that's way out of scope of what I'm trying to do here and doesn't
change the need for a the cdev_get_by_path()).

If the user specifies a non-nvme char device, it is rejected by the code
in nvme_ctrl_get_by_path() when it compares the fops. See patch 4.

Logan
Greg Kroah-Hartman July 25, 2019, 7:43 p.m. UTC | #22
On Thu, Jul 25, 2019 at 12:37:11PM -0700, Sagi Grimberg wrote:
> 
> > > > > > > Why do you have a "string" within the kernel and are not using the
> > > > > > > normal open() call from userspace on the character device node on the
> > > > > > > filesystem in your namespace/mount/whatever?
> > > > > > 
> > > > > > NVMe-OF is configured using configfs. The target is specified by the
> > > > > > user writing a path to a configfs attribute. This is the way it works
> > > > > > today but with blkdev_get_by_path()[1]. For the passthru code, we need
> > > > > > to get a nvme_ctrl instead of a block_device, but the principal is the same.
> > > > > 
> > > > > Why isn't a fd being passed in there instead of a random string?
> > > > 
> > > > I wouldn't know the answer to this but I assume because once we decided
> > > > to use configfs, there was no way for the user to pass the kernel an fd.
> > > 
> > > That's definitely not changing. But this is not different than how we
> > > use the block device or file configuration, this just happen to need the
> > > nvme controller chardev now to issue I/O.
> > 
> > So, as was kind of alluded to in another part of the thread, what are
> > you doing about permissions?  It seems that any user/group permissions
> > are out the window when you have the kernel itself do the opening of the
> > char device, right?  Why is that ok?  You can pass it _any_ character
> > device node and away it goes?  What if you give it a "wrong" one?  Char
> > devices are very different from block devices this way.
> 
> We could condition any configfs operation on capable(CAP_NET_ADMIN) to
> close that hole for now..

Why that specific permission?

And what about the "pass any random char device name" issue?  What
happens if you pass /dev/random/ as the string?

thanks,

greg k-h
Sagi Grimberg July 25, 2019, 7:43 p.m. UTC | #23
>> So, as was kind of alluded to in another part of the thread, what are
>> you doing about permissions?  It seems that any user/group permissions
>> are out the window when you have the kernel itself do the opening of the
>> char device, right?  Why is that ok?  You can pass it _any_ character
>> device node and away it goes?  What if you give it a "wrong" one?  Char
>> devices are very different from block devices this way.
> 
> We could condition any configfs operation on capable(CAP_NET_ADMIN) to
> close that hole for now..

s/NET/SYS/...
Sagi Grimberg July 25, 2019, 7:45 p.m. UTC | #24
>>> So, as was kind of alluded to in another part of the thread, what are
>>> you doing about permissions?  It seems that any user/group permissions
>>> are out the window when you have the kernel itself do the opening of the
>>> char device, right?  Why is that ok?  You can pass it _any_ character
>>> device node and away it goes?  What if you give it a "wrong" one?  Char
>>> devices are very different from block devices this way.
>>
>> We could condition any configfs operation on capable(CAP_NET_ADMIN) to
>> close that hole for now..
> 
> Why that specific permission?

Meant CAP_SYS_ADMIN

> And what about the "pass any random char device name" issue?  What
> happens if you pass /dev/random/ as the string?

What is the difference if the application is opening the device if
it has the wrong path?
Al Viro July 25, 2019, 11:55 p.m. UTC | #25
On Thu, Jul 25, 2019 at 01:24:22PM -0600, Logan Gunthorpe wrote:
> 
> 
> On 2019-07-25 1:11 p.m., Matthew Wilcox wrote:
> > On Thu, Jul 25, 2019 at 12:05:29PM -0700, Sagi Grimberg wrote:
> >>
> >>>>> NVMe-OF is configured using configfs. The target is specified by the
> >>>>> user writing a path to a configfs attribute. This is the way it works
> >>>>> today but with blkdev_get_by_path()[1]. For the passthru code, we need
> >>>>> to get a nvme_ctrl instead of a block_device, but the principal is the same.
> >>>>
> >>>> Why isn't a fd being passed in there instead of a random string?
> >>>
> >>> I suppose we could echo a string of the file descriptor number there,
> >>> and look up the fd in the process' file descriptor table ...
> >>
> >> Assuming that there is a open handle somewhere out there...
> 
> Yes, that would be a step backwards from an interface. The user would
> then need a special process to open the fd and pass it through configfs.
> They couldn't just do it with basic bash commands.

First of all, they can, but... WTF not have filp_open() done right there?
Yes, by name.  With permission checks done.  And pick your object from the
sodding struct file you'll get.

What's the problem?  Why do you need cdev lookups, etc., when you are
dealing with files under your full control?  Just open them and use
->private_data or whatever you set in ->open() to access the damn thing.
All there is to it...
Sagi Grimberg July 26, 2019, 4:29 a.m. UTC | #26
>>>>>>> NVMe-OF is configured using configfs. The target is specified by the
>>>>>>> user writing a path to a configfs attribute. This is the way it works
>>>>>>> today but with blkdev_get_by_path()[1]. For the passthru code, we need
>>>>>>> to get a nvme_ctrl instead of a block_device, but the principal is the same.
>>>>>>
>>>>>> Why isn't a fd being passed in there instead of a random string?
>>>>>
>>>>> I suppose we could echo a string of the file descriptor number there,
>>>>> and look up the fd in the process' file descriptor table ...
>>>>
>>>> Assuming that there is a open handle somewhere out there...
>>
>> Yes, that would be a step backwards from an interface. The user would
>> then need a special process to open the fd and pass it through configfs.
>> They couldn't just do it with basic bash commands.
> 
> First of all, they can, but... WTF not have filp_open() done right there?
> Yes, by name.  With permission checks done.  And pick your object from the
> sodding struct file you'll get.
> 
> What's the problem?  Why do you need cdev lookups, etc., when you are
> dealing with files under your full control?  Just open them and use
> ->private_data or whatever you set in ->open() to access the damn thing.
> All there is to it...
Oh this is so much simpler. There is really no point in using anything
else. Just need to remember to compare f->f_op to what we expect to make
sure that it is indeed the same device class.
Greg Kroah-Hartman July 26, 2019, 7:13 a.m. UTC | #27
On Thu, Jul 25, 2019 at 09:29:40PM -0700, Sagi Grimberg wrote:
> 
> > > > > > > > NVMe-OF is configured using configfs. The target is specified by the
> > > > > > > > user writing a path to a configfs attribute. This is the way it works
> > > > > > > > today but with blkdev_get_by_path()[1]. For the passthru code, we need
> > > > > > > > to get a nvme_ctrl instead of a block_device, but the principal is the same.
> > > > > > > 
> > > > > > > Why isn't a fd being passed in there instead of a random string?
> > > > > > 
> > > > > > I suppose we could echo a string of the file descriptor number there,
> > > > > > and look up the fd in the process' file descriptor table ...
> > > > > 
> > > > > Assuming that there is a open handle somewhere out there...
> > > 
> > > Yes, that would be a step backwards from an interface. The user would
> > > then need a special process to open the fd and pass it through configfs.
> > > They couldn't just do it with basic bash commands.
> > 
> > First of all, they can, but... WTF not have filp_open() done right there?
> > Yes, by name.  With permission checks done.  And pick your object from the
> > sodding struct file you'll get.
> > 
> > What's the problem?  Why do you need cdev lookups, etc., when you are
> > dealing with files under your full control?  Just open them and use
> > ->private_data or whatever you set in ->open() to access the damn thing.
> > All there is to it...
> Oh this is so much simpler. There is really no point in using anything
> else. Just need to remember to compare f->f_op to what we expect to make
> sure that it is indeed the same device class.

That is good, that solves the "/dev/random/" issue I was talking about
earlier as well.

Odds are you all can do the same for the blockdev interface as well.

thanks,

greg k-h
Logan Gunthorpe July 26, 2019, 3:46 p.m. UTC | #28
On 2019-07-25 10:29 p.m., Sagi Grimberg wrote:
> 
>>>>>>>> NVMe-OF is configured using configfs. The target is specified by
>>>>>>>> the
>>>>>>>> user writing a path to a configfs attribute. This is the way it
>>>>>>>> works
>>>>>>>> today but with blkdev_get_by_path()[1]. For the passthru code,
>>>>>>>> we need
>>>>>>>> to get a nvme_ctrl instead of a block_device, but the principal
>>>>>>>> is the same.
>>>>>>>
>>>>>>> Why isn't a fd being passed in there instead of a random string?
>>>>>>
>>>>>> I suppose we could echo a string of the file descriptor number there,
>>>>>> and look up the fd in the process' file descriptor table ...
>>>>>
>>>>> Assuming that there is a open handle somewhere out there...
>>>
>>> Yes, that would be a step backwards from an interface. The user would
>>> then need a special process to open the fd and pass it through configfs.
>>> They couldn't just do it with basic bash commands.
>>
>> First of all, they can, but... WTF not have filp_open() done right there?
>> Yes, by name.  With permission checks done.  And pick your object from
>> the
>> sodding struct file you'll get.
>>
>> What's the problem?  Why do you need cdev lookups, etc., when you are
>> dealing with files under your full control?  Just open them and use
>> ->private_data or whatever you set in ->open() to access the damn thing.
>> All there is to it...
> Oh this is so much simpler. There is really no point in using anything
> else. Just need to remember to compare f->f_op to what we expect to make
> sure that it is indeed the same device class.

Yes, that sounds like a good idea. I'll do this for v2.

Thanks,

Logan
diff mbox series

Patch

diff --git a/fs/char_dev.c b/fs/char_dev.c
index 5cc53bd5f654..25037d642ff8 100644
--- a/fs/char_dev.c
+++ b/fs/char_dev.c
@@ -22,6 +22,7 @@ 
 #include <linux/mutex.h>
 #include <linux/backing-dev.h>
 #include <linux/tty.h>
+#include <linux/namei.h>
 
 #include "internal.h"
 
@@ -403,6 +404,38 @@  static struct cdev *cdev_lookup(struct inode *inode)
 	return p;
 }
 
+struct cdev *cdev_get_by_path(const char *pathname)
+{
+	struct inode *inode;
+	struct cdev *cdev;
+	struct path path;
+	int error;
+
+	if (!pathname || !*pathname)
+		return ERR_PTR(-EINVAL);
+
+	error = kern_path(pathname, LOOKUP_FOLLOW, &path);
+	if (error)
+		return ERR_PTR(error);
+
+	if (!may_open_dev(&path)) {
+		cdev = ERR_PTR(-EACCES);
+		goto out;
+	}
+
+	inode = d_backing_inode(path.dentry);
+	if (!S_ISCHR(inode->i_mode)) {
+		cdev = ERR_PTR(-EINVAL);
+		goto out;
+	}
+
+	cdev = cdev_lookup(inode);
+
+out:
+	path_put(&path);
+	return cdev;
+}
+
 /*
  * Called every time a character special file is opened
  */
@@ -690,5 +723,6 @@  EXPORT_SYMBOL(cdev_add);
 EXPORT_SYMBOL(cdev_set_parent);
 EXPORT_SYMBOL(cdev_device_add);
 EXPORT_SYMBOL(cdev_device_del);
+EXPORT_SYMBOL(cdev_get_by_path);
 EXPORT_SYMBOL(__register_chrdev);
 EXPORT_SYMBOL(__unregister_chrdev);
diff --git a/include/linux/cdev.h b/include/linux/cdev.h
index 0e8cd6293deb..c7f2df2ca69a 100644
--- a/include/linux/cdev.h
+++ b/include/linux/cdev.h
@@ -24,6 +24,7 @@  void cdev_init(struct cdev *, const struct file_operations *);
 
 struct cdev *cdev_alloc(void);
 
+struct cdev *cdev_get_by_path(const char *pathname);
 void cdev_put(struct cdev *p);
 
 int cdev_add(struct cdev *, dev_t, unsigned);