Message ID | 20190725172335.6825-3-logang@deltatee.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | nvmet: add target passthru commands support | expand |
On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote: > cdev_get_by_path() attempts to retrieve a struct cdev from > a path name. It is analagous to blkdev_get_by_path(). > > This will be necessary to create a nvme_ctrl_get_by_path()to > support NVMe-OF passthru. Ick, why? Why would a cdev have a "pathname"? What is "NVMe-OF passthru"? Why does a char device node have anything to do with NVMe? We have way too many ways to abuse cdevs today, my long-term-wish has always been to clean this interface up to make it more sane and unified, and get rid of the "outliers" (all created at the time for a good reason, that's not the problem.) But to add "just one more" seems really odd to me. thanks, greg k-h
On 2019-07-25 11:40 a.m., Greg Kroah-Hartman wrote: > On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote: >> cdev_get_by_path() attempts to retrieve a struct cdev from >> a path name. It is analagous to blkdev_get_by_path(). >> >> This will be necessary to create a nvme_ctrl_get_by_path()to >> support NVMe-OF passthru. > > Ick, why? Why would a cdev have a "pathname"? So we can go from "/dev/nvme0" (which points to a char device) to its struct cdev and eventually it's struct nvme_ctrl. Doing it this way also allows supporting symlinks that might be created by udev rules. This is very similar to blkdev_get_by_path() that lets regular NVMe-OF obtain the struct block_device from a path. I didn't think this would be all that controversial. > What is "NVMe-OF passthru"? Why does a char device node have anything > to do with NVMe? NVME-OF passthru is support for NVME over fabrics to directly target a regular NVMe controller and thus export an entire NVMe device to a remote system. We need to be able to tell the kernel which controller to use and IMO a path to the device file is the best way as it allows us to support symlinks created by udev. > We have way too many ways to abuse cdevs today, my long-term-wish has > always been to clean this interface up to make it more sane and unified, > and get rid of the "outliers" (all created at the time for a good > reason, that's not the problem.) But to add "just one more" seems > really odd to me. Well it doesn't seem all that much like an outlier to me. Logan
On Thu, Jul 25, 2019 at 11:53:20AM -0600, Logan Gunthorpe wrote: > > > On 2019-07-25 11:40 a.m., Greg Kroah-Hartman wrote: > > On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote: > >> cdev_get_by_path() attempts to retrieve a struct cdev from > >> a path name. It is analagous to blkdev_get_by_path(). > >> > >> This will be necessary to create a nvme_ctrl_get_by_path()to > >> support NVMe-OF passthru. > > > > Ick, why? Why would a cdev have a "pathname"? > > So we can go from "/dev/nvme0" (which points to a char device) to its > struct cdev and eventually it's struct nvme_ctrl. Doing it this way also > allows supporting symlinks that might be created by udev rules. But you're not really trying to go from a string to a chardev. You're trying to go from a nvmet_subsys to a chardev. Isn't there a better way to link the two somewhere else? (I must confess that once I would have known the answer to this, but the NVMe subsystem has grown ridiculously complex and I can no longer fit it in my head)
On Thu, Jul 25, 2019 at 11:53:20AM -0600, Logan Gunthorpe wrote: > > > On 2019-07-25 11:40 a.m., Greg Kroah-Hartman wrote: > > On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote: > >> cdev_get_by_path() attempts to retrieve a struct cdev from > >> a path name. It is analagous to blkdev_get_by_path(). > >> > >> This will be necessary to create a nvme_ctrl_get_by_path()to > >> support NVMe-OF passthru. > > > > Ick, why? Why would a cdev have a "pathname"? > > So we can go from "/dev/nvme0" (which points to a char device) to its > struct cdev and eventually it's struct nvme_ctrl. Doing it this way also > allows supporting symlinks that might be created by udev rules. Why do you have a "string" within the kernel and are not using the normal open() call from userspace on the character device node on the filesystem in your namespace/mount/whatever? Where is this random string coming from? Why is this so special that no one else has ever needed it? thanks, greg k-h
On 2019-07-25 11:58 a.m., Matthew Wilcox wrote: > On Thu, Jul 25, 2019 at 11:53:20AM -0600, Logan Gunthorpe wrote: >> >> >> On 2019-07-25 11:40 a.m., Greg Kroah-Hartman wrote: >>> On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote: >>>> cdev_get_by_path() attempts to retrieve a struct cdev from >>>> a path name. It is analagous to blkdev_get_by_path(). >>>> >>>> This will be necessary to create a nvme_ctrl_get_by_path()to >>>> support NVMe-OF passthru. >>> >>> Ick, why? Why would a cdev have a "pathname"? >> >> So we can go from "/dev/nvme0" (which points to a char device) to its >> struct cdev and eventually it's struct nvme_ctrl. Doing it this way also >> allows supporting symlinks that might be created by udev rules. > > But you're not really trying to go from a string to a chardev. You're > trying to go from a nvmet_subsys to a chardev. Isn't there a better > way to link the two somewhere else? > > (I must confess that once I would have known the answer to this, but > the NVMe subsystem has grown ridiculously complex and I can no longer > fit it in my head) Well the nvmet_subsys isn't related to the nvme_ctrl (and thus char dev) at all. An nvmet_subsys is created via configfs and the user has to specify an NVMe controller for it to use (by writting a string to a config attribute). The best handle the user has is a path to the controller's cdev (/dev/nvmeX) so the fabrics code has to be able to lookup the corresponding struct nvme_ctrl from the path. This is directly analogous to the way NVMe-of works today: it uses blkdev_get_by_path() to translate a user provided path to a struct block_device. The only difference here is that, for passthru, we need a nvme_ctrl, not a block device. Logan
On Thu, Jul 25, 2019 at 11:53:20AM -0600, Logan Gunthorpe wrote: > > > On 2019-07-25 11:40 a.m., Greg Kroah-Hartman wrote: > > On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote: > >> cdev_get_by_path() attempts to retrieve a struct cdev from > >> a path name. It is analagous to blkdev_get_by_path(). > >> > >> This will be necessary to create a nvme_ctrl_get_by_path()to > >> support NVMe-OF passthru. > > > > Ick, why? Why would a cdev have a "pathname"? > > So we can go from "/dev/nvme0" (which points to a char device) to its > struct cdev and eventually it's struct nvme_ctrl. Doing it this way also > allows supporting symlinks that might be created by udev rules. > > This is very similar to blkdev_get_by_path() that lets regular NVMe-OF > obtain the struct block_device from a path. > > I didn't think this would be all that controversial. > > > What is "NVMe-OF passthru"? Why does a char device node have anything > > to do with NVMe? > > NVME-OF passthru is support for NVME over fabrics to directly target a > regular NVMe controller and thus export an entire NVMe device to a > remote system. We need to be able to tell the kernel which controller to > use and IMO a path to the device file is the best way as it allows us to > support symlinks created by udev. open() in userspace handles symlinks just fine, what crazy interface passes a string to try to find a char device node that is not open()? And why do you need a char device at all anyway? Is this just the "normal" nvme controller's character device node? > > We have way too many ways to abuse cdevs today, my long-term-wish has > > always been to clean this interface up to make it more sane and unified, > > and get rid of the "outliers" (all created at the time for a good > > reason, that's not the problem.) But to add "just one more" seems > > really odd to me. > > Well it doesn't seem all that much like an outlier to me. Everyone is special, just like everyone else :) Seriously, as no one else has ever needed this, you are an outlier. thanks, greg k-h
On 2019-07-25 12:08 p.m., Greg Kroah-Hartman wrote: > On Thu, Jul 25, 2019 at 11:53:20AM -0600, Logan Gunthorpe wrote: >> >> >> On 2019-07-25 11:40 a.m., Greg Kroah-Hartman wrote: >>> On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote: >>>> cdev_get_by_path() attempts to retrieve a struct cdev from >>>> a path name. It is analagous to blkdev_get_by_path(). >>>> >>>> This will be necessary to create a nvme_ctrl_get_by_path()to >>>> support NVMe-OF passthru. >>> >>> Ick, why? Why would a cdev have a "pathname"? >> >> So we can go from "/dev/nvme0" (which points to a char device) to its >> struct cdev and eventually it's struct nvme_ctrl. Doing it this way also >> allows supporting symlinks that might be created by udev rules. > > Why do you have a "string" within the kernel and are not using the > normal open() call from userspace on the character device node on the > filesystem in your namespace/mount/whatever? NVMe-OF is configured using configfs. The target is specified by the user writing a path to a configfs attribute. This is the way it works today but with blkdev_get_by_path()[1]. For the passthru code, we need to get a nvme_ctrl instead of a block_device, but the principal is the same. > Where is this random string coming from? configfs > Why is this so special that no > one else has ever needed it? People have needed the same functionality for block devices and blkdev_get_by_path() has multiple users (iscsi, drbd, nvme-of, etc) which are doing similar things. Nobody has needed to do the same with a chardev until we wanted the NVMe-of to support targeting an NVMe controller which is represented in userspace by a char device. Logan [1] https://elixir.bootlin.com/linux/latest/source/drivers/nvme/target/io-cmd-bdev.c#L15
On 2019-07-25 12:10 p.m., Greg Kroah-Hartman wrote: > On Thu, Jul 25, 2019 at 11:53:20AM -0600, Logan Gunthorpe wrote: >> >> >> On 2019-07-25 11:40 a.m., Greg Kroah-Hartman wrote: >>> On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote: >>>> cdev_get_by_path() attempts to retrieve a struct cdev from >>>> a path name. It is analagous to blkdev_get_by_path(). >>>> >>>> This will be necessary to create a nvme_ctrl_get_by_path()to >>>> support NVMe-OF passthru. >>> >>> Ick, why? Why would a cdev have a "pathname"? >> >> So we can go from "/dev/nvme0" (which points to a char device) to its >> struct cdev and eventually it's struct nvme_ctrl. Doing it this way also >> allows supporting symlinks that might be created by udev rules. >> >> This is very similar to blkdev_get_by_path() that lets regular NVMe-OF >> obtain the struct block_device from a path. >> >> I didn't think this would be all that controversial. >> >>> What is "NVMe-OF passthru"? Why does a char device node have anything >>> to do with NVMe? >> >> NVME-OF passthru is support for NVME over fabrics to directly target a >> regular NVMe controller and thus export an entire NVMe device to a >> remote system. We need to be able to tell the kernel which controller to >> use and IMO a path to the device file is the best way as it allows us to >> support symlinks created by udev. > > open() in userspace handles symlinks just fine, what crazy interface > passes a string to try to find a char device node that is not open()? configfs. Which I'm stuck with seeing nvme-of already uses that for configuration and I don't think that's going to change... > And why do you need a char device at all anyway? Is this just the > "normal" nvme controller's character device node? Yes. Logan
On Thu, Jul 25, 2019 at 12:14:33PM -0600, Logan Gunthorpe wrote: > > > On 2019-07-25 12:08 p.m., Greg Kroah-Hartman wrote: > > On Thu, Jul 25, 2019 at 11:53:20AM -0600, Logan Gunthorpe wrote: > >> > >> > >> On 2019-07-25 11:40 a.m., Greg Kroah-Hartman wrote: > >>> On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote: > >>>> cdev_get_by_path() attempts to retrieve a struct cdev from > >>>> a path name. It is analagous to blkdev_get_by_path(). > >>>> > >>>> This will be necessary to create a nvme_ctrl_get_by_path()to > >>>> support NVMe-OF passthru. > >>> > >>> Ick, why? Why would a cdev have a "pathname"? > >> > >> So we can go from "/dev/nvme0" (which points to a char device) to its > >> struct cdev and eventually it's struct nvme_ctrl. Doing it this way also > >> allows supporting symlinks that might be created by udev rules. > > > > Why do you have a "string" within the kernel and are not using the > > normal open() call from userspace on the character device node on the > > filesystem in your namespace/mount/whatever? > > NVMe-OF is configured using configfs. The target is specified by the > user writing a path to a configfs attribute. This is the way it works > today but with blkdev_get_by_path()[1]. For the passthru code, we need > to get a nvme_ctrl instead of a block_device, but the principal is the same. Why isn't a fd being passed in there instead of a random string? Seems odd, but oh well, that ship sailed a long time ago for block devices I guess. So what do you actually _do_ with that char device once you have it? greg k-h
On 2019-07-25 12:27 p.m., Greg Kroah-Hartman wrote: >>> Why do you have a "string" within the kernel and are not using the >>> normal open() call from userspace on the character device node on the >>> filesystem in your namespace/mount/whatever? >> >> NVMe-OF is configured using configfs. The target is specified by the >> user writing a path to a configfs attribute. This is the way it works >> today but with blkdev_get_by_path()[1]. For the passthru code, we need >> to get a nvme_ctrl instead of a block_device, but the principal is the same. > > Why isn't a fd being passed in there instead of a random string? I wouldn't know the answer to this but I assume because once we decided to use configfs, there was no way for the user to pass the kernel an fd. > Seems odd, but oh well, that ship sailed a long time ago for block > devices I guess. Yup. > So what do you actually _do_ with that char device once you have it? We lookup the struct nvme_ctrl and use it to submit passed-through NVMe commands directly to the controller. Logan
On Thu, Jul 25, 2019 at 08:27:01PM +0200, Greg Kroah-Hartman wrote: > > NVMe-OF is configured using configfs. The target is specified by the > > user writing a path to a configfs attribute. This is the way it works > > today but with blkdev_get_by_path()[1]. For the passthru code, we need > > to get a nvme_ctrl instead of a block_device, but the principal is the same. > > Why isn't a fd being passed in there instead of a random string? I suppose we could echo a string of the file descriptor number there, and look up the fd in the process' file descriptor table ... I'll get my coat.
>>>> Why do you have a "string" within the kernel and are not using the >>>> normal open() call from userspace on the character device node on the >>>> filesystem in your namespace/mount/whatever? >>> >>> NVMe-OF is configured using configfs. The target is specified by the >>> user writing a path to a configfs attribute. This is the way it works >>> today but with blkdev_get_by_path()[1]. For the passthru code, we need >>> to get a nvme_ctrl instead of a block_device, but the principal is the same. >> >> Why isn't a fd being passed in there instead of a random string? > > I wouldn't know the answer to this but I assume because once we decided > to use configfs, there was no way for the user to pass the kernel an fd. That's definitely not changing. But this is not different than how we use the block device or file configuration, this just happen to need the nvme controller chardev now to issue I/O.
>>> NVMe-OF is configured using configfs. The target is specified by the >>> user writing a path to a configfs attribute. This is the way it works >>> today but with blkdev_get_by_path()[1]. For the passthru code, we need >>> to get a nvme_ctrl instead of a block_device, but the principal is the same. >> >> Why isn't a fd being passed in there instead of a random string? > > I suppose we could echo a string of the file descriptor number there, > and look up the fd in the process' file descriptor table ... Assuming that there is a open handle somewhere out there... > I'll get my coat. Grab mine too..
On Thu, Jul 25, 2019 at 12:05:29PM -0700, Sagi Grimberg wrote: > > > > > NVMe-OF is configured using configfs. The target is specified by the > > > > user writing a path to a configfs attribute. This is the way it works > > > > today but with blkdev_get_by_path()[1]. For the passthru code, we need > > > > to get a nvme_ctrl instead of a block_device, but the principal is the same. > > > > > > Why isn't a fd being passed in there instead of a random string? > > > > I suppose we could echo a string of the file descriptor number there, > > and look up the fd in the process' file descriptor table ... > > Assuming that there is a open handle somewhere out there... Well, that's how we'd know that the application echoing /dev/nvme3 into configfs actually has permission to access /dev/nvme3. Think about containers, for example. It's not exactly safe to mount configfs in a non-root container since it can access any NVMe device in the system, not just ones which it's been given permission to access. Right?
On 2019-07-25 1:11 p.m., Matthew Wilcox wrote: > On Thu, Jul 25, 2019 at 12:05:29PM -0700, Sagi Grimberg wrote: >> >>>>> NVMe-OF is configured using configfs. The target is specified by the >>>>> user writing a path to a configfs attribute. This is the way it works >>>>> today but with blkdev_get_by_path()[1]. For the passthru code, we need >>>>> to get a nvme_ctrl instead of a block_device, but the principal is the same. >>>> >>>> Why isn't a fd being passed in there instead of a random string? >>> >>> I suppose we could echo a string of the file descriptor number there, >>> and look up the fd in the process' file descriptor table ... >> >> Assuming that there is a open handle somewhere out there... Yes, that would be a step backwards from an interface. The user would then need a special process to open the fd and pass it through configfs. They couldn't just do it with basic bash commands. > Well, that's how we'd know that the application echoing /dev/nvme3 into > configfs actually has permission to access /dev/nvme3. It's the kernel that's accessing the device so it has permission. root permission is required to configure the kernel. > Think about > containers, for example. It's not exactly safe to mount configfs in a > non-root container since it can access any NVMe device in the system, > not just ones which it's been given permission to access. Right? I don't think it really makes any sense to talk about NVMe-of and containers. Though, if we did it would be solely on the configuration interface so that users inside a container might be able to configure a new target for resources they can see and they'd have to have their own view into configfs.... Logan
On Thu, Jul 25, 2019 at 01:24:22PM -0600, Logan Gunthorpe wrote: > >> Assuming that there is a open handle somewhere out there... > > Yes, that would be a step backwards from an interface. The user would > then need a special process to open the fd and pass it through configfs. > They couldn't just do it with basic bash commands. echo 3 3</dev/nvme3 >/configfs/foor/bar/whatever
>>>>> NVMe-OF is configured using configfs. The target is specified by the >>>>> user writing a path to a configfs attribute. This is the way it works >>>>> today but with blkdev_get_by_path()[1]. For the passthru code, we need >>>>> to get a nvme_ctrl instead of a block_device, but the principal is the same. >>>> >>>> Why isn't a fd being passed in there instead of a random string? >>> >>> I suppose we could echo a string of the file descriptor number there, >>> and look up the fd in the process' file descriptor table ... >> >> Assuming that there is a open handle somewhere out there... > > Well, that's how we'd know that the application echoing /dev/nvme3 into > configfs actually has permission to access /dev/nvme3. Actually, the application is exposing a target device to someone else, its actually preferable that it doesn't have access to it as its possibly can create a consistency hole, but that is usually a root application anyways... We could verify at least that though.. > Think about > containers, for example. It's not exactly safe to mount configfs in a > non-root container since it can access any NVMe device in the system, > not just ones which it's been given permission to access. Right? I'm seeing this as an equivalent to an application that is binding a socket to an ip address, and the kernel looks-up according to the net namespace that the socket has. I do agree this is an area that was never really thought of. But what you are describing requires infrastructure around it instead of forcing the user to pass an fd to validate around it.
On 2019-07-25 1:26 p.m., Matthew Wilcox wrote: > On Thu, Jul 25, 2019 at 01:24:22PM -0600, Logan Gunthorpe wrote: >>>> Assuming that there is a open handle somewhere out there... >> >> Yes, that would be a step backwards from an interface. The user would >> then need a special process to open the fd and pass it through configfs. >> They couldn't just do it with basic bash commands. > > echo 3 3</dev/nvme3 >/configfs/foor/bar/whatever Neat. I didn't know you could do that. Logan
On Thu, Jul 25, 2019 at 12:02:30PM -0700, Sagi Grimberg wrote: > > > > > > Why do you have a "string" within the kernel and are not using the > > > > > normal open() call from userspace on the character device node on the > > > > > filesystem in your namespace/mount/whatever? > > > > > > > > NVMe-OF is configured using configfs. The target is specified by the > > > > user writing a path to a configfs attribute. This is the way it works > > > > today but with blkdev_get_by_path()[1]. For the passthru code, we need > > > > to get a nvme_ctrl instead of a block_device, but the principal is the same. > > > > > > Why isn't a fd being passed in there instead of a random string? > > > > I wouldn't know the answer to this but I assume because once we decided > > to use configfs, there was no way for the user to pass the kernel an fd. > > That's definitely not changing. But this is not different than how we > use the block device or file configuration, this just happen to need the > nvme controller chardev now to issue I/O. So, as was kind of alluded to in another part of the thread, what are you doing about permissions? It seems that any user/group permissions are out the window when you have the kernel itself do the opening of the char device, right? Why is that ok? You can pass it _any_ character device node and away it goes? What if you give it a "wrong" one? Char devices are very different from block devices this way. thanks, greg k-h
>>>>>> Why do you have a "string" within the kernel and are not using the >>>>>> normal open() call from userspace on the character device node on the >>>>>> filesystem in your namespace/mount/whatever? >>>>> >>>>> NVMe-OF is configured using configfs. The target is specified by the >>>>> user writing a path to a configfs attribute. This is the way it works >>>>> today but with blkdev_get_by_path()[1]. For the passthru code, we need >>>>> to get a nvme_ctrl instead of a block_device, but the principal is the same. >>>> >>>> Why isn't a fd being passed in there instead of a random string? >>> >>> I wouldn't know the answer to this but I assume because once we decided >>> to use configfs, there was no way for the user to pass the kernel an fd. >> >> That's definitely not changing. But this is not different than how we >> use the block device or file configuration, this just happen to need the >> nvme controller chardev now to issue I/O. > > So, as was kind of alluded to in another part of the thread, what are > you doing about permissions? It seems that any user/group permissions > are out the window when you have the kernel itself do the opening of the > char device, right? Why is that ok? You can pass it _any_ character > device node and away it goes? What if you give it a "wrong" one? Char > devices are very different from block devices this way. We could condition any configfs operation on capable(CAP_NET_ADMIN) to close that hole for now..
On 2019-07-25 1:34 p.m., Greg Kroah-Hartman wrote: > On Thu, Jul 25, 2019 at 12:02:30PM -0700, Sagi Grimberg wrote: >> >>>>>> Why do you have a "string" within the kernel and are not using the >>>>>> normal open() call from userspace on the character device node on the >>>>>> filesystem in your namespace/mount/whatever? >>>>> >>>>> NVMe-OF is configured using configfs. The target is specified by the >>>>> user writing a path to a configfs attribute. This is the way it works >>>>> today but with blkdev_get_by_path()[1]. For the passthru code, we need >>>>> to get a nvme_ctrl instead of a block_device, but the principal is the same. >>>> >>>> Why isn't a fd being passed in there instead of a random string? >>> >>> I wouldn't know the answer to this but I assume because once we decided >>> to use configfs, there was no way for the user to pass the kernel an fd. >> >> That's definitely not changing. But this is not different than how we >> use the block device or file configuration, this just happen to need the >> nvme controller chardev now to issue I/O. > > So, as was kind of alluded to in another part of the thread, what are > you doing about permissions? It seems that any user/group permissions > are out the window when you have the kernel itself do the opening of the > char device, right? Why is that ok? You can pass it _any_ character > device node and away it goes? What if you give it a "wrong" one? Char > devices are very different from block devices this way. Well the permission question is no different from the block-device case we already have. The user has to be root to configure a target so it has access to the block/char device. Containers and NVMe-of are really not designed to mix and would take a lot of work to make this make any sense (And that's way out of scope of what I'm trying to do here and doesn't change the need for a the cdev_get_by_path()). If the user specifies a non-nvme char device, it is rejected by the code in nvme_ctrl_get_by_path() when it compares the fops. See patch 4. Logan
On Thu, Jul 25, 2019 at 12:37:11PM -0700, Sagi Grimberg wrote: > > > > > > > > Why do you have a "string" within the kernel and are not using the > > > > > > > normal open() call from userspace on the character device node on the > > > > > > > filesystem in your namespace/mount/whatever? > > > > > > > > > > > > NVMe-OF is configured using configfs. The target is specified by the > > > > > > user writing a path to a configfs attribute. This is the way it works > > > > > > today but with blkdev_get_by_path()[1]. For the passthru code, we need > > > > > > to get a nvme_ctrl instead of a block_device, but the principal is the same. > > > > > > > > > > Why isn't a fd being passed in there instead of a random string? > > > > > > > > I wouldn't know the answer to this but I assume because once we decided > > > > to use configfs, there was no way for the user to pass the kernel an fd. > > > > > > That's definitely not changing. But this is not different than how we > > > use the block device or file configuration, this just happen to need the > > > nvme controller chardev now to issue I/O. > > > > So, as was kind of alluded to in another part of the thread, what are > > you doing about permissions? It seems that any user/group permissions > > are out the window when you have the kernel itself do the opening of the > > char device, right? Why is that ok? You can pass it _any_ character > > device node and away it goes? What if you give it a "wrong" one? Char > > devices are very different from block devices this way. > > We could condition any configfs operation on capable(CAP_NET_ADMIN) to > close that hole for now.. Why that specific permission? And what about the "pass any random char device name" issue? What happens if you pass /dev/random/ as the string? thanks, greg k-h
>> So, as was kind of alluded to in another part of the thread, what are >> you doing about permissions? It seems that any user/group permissions >> are out the window when you have the kernel itself do the opening of the >> char device, right? Why is that ok? You can pass it _any_ character >> device node and away it goes? What if you give it a "wrong" one? Char >> devices are very different from block devices this way. > > We could condition any configfs operation on capable(CAP_NET_ADMIN) to > close that hole for now.. s/NET/SYS/...
>>> So, as was kind of alluded to in another part of the thread, what are >>> you doing about permissions? It seems that any user/group permissions >>> are out the window when you have the kernel itself do the opening of the >>> char device, right? Why is that ok? You can pass it _any_ character >>> device node and away it goes? What if you give it a "wrong" one? Char >>> devices are very different from block devices this way. >> >> We could condition any configfs operation on capable(CAP_NET_ADMIN) to >> close that hole for now.. > > Why that specific permission? Meant CAP_SYS_ADMIN > And what about the "pass any random char device name" issue? What > happens if you pass /dev/random/ as the string? What is the difference if the application is opening the device if it has the wrong path?
On Thu, Jul 25, 2019 at 01:24:22PM -0600, Logan Gunthorpe wrote: > > > On 2019-07-25 1:11 p.m., Matthew Wilcox wrote: > > On Thu, Jul 25, 2019 at 12:05:29PM -0700, Sagi Grimberg wrote: > >> > >>>>> NVMe-OF is configured using configfs. The target is specified by the > >>>>> user writing a path to a configfs attribute. This is the way it works > >>>>> today but with blkdev_get_by_path()[1]. For the passthru code, we need > >>>>> to get a nvme_ctrl instead of a block_device, but the principal is the same. > >>>> > >>>> Why isn't a fd being passed in there instead of a random string? > >>> > >>> I suppose we could echo a string of the file descriptor number there, > >>> and look up the fd in the process' file descriptor table ... > >> > >> Assuming that there is a open handle somewhere out there... > > Yes, that would be a step backwards from an interface. The user would > then need a special process to open the fd and pass it through configfs. > They couldn't just do it with basic bash commands. First of all, they can, but... WTF not have filp_open() done right there? Yes, by name. With permission checks done. And pick your object from the sodding struct file you'll get. What's the problem? Why do you need cdev lookups, etc., when you are dealing with files under your full control? Just open them and use ->private_data or whatever you set in ->open() to access the damn thing. All there is to it...
>>>>>>> NVMe-OF is configured using configfs. The target is specified by the >>>>>>> user writing a path to a configfs attribute. This is the way it works >>>>>>> today but with blkdev_get_by_path()[1]. For the passthru code, we need >>>>>>> to get a nvme_ctrl instead of a block_device, but the principal is the same. >>>>>> >>>>>> Why isn't a fd being passed in there instead of a random string? >>>>> >>>>> I suppose we could echo a string of the file descriptor number there, >>>>> and look up the fd in the process' file descriptor table ... >>>> >>>> Assuming that there is a open handle somewhere out there... >> >> Yes, that would be a step backwards from an interface. The user would >> then need a special process to open the fd and pass it through configfs. >> They couldn't just do it with basic bash commands. > > First of all, they can, but... WTF not have filp_open() done right there? > Yes, by name. With permission checks done. And pick your object from the > sodding struct file you'll get. > > What's the problem? Why do you need cdev lookups, etc., when you are > dealing with files under your full control? Just open them and use > ->private_data or whatever you set in ->open() to access the damn thing. > All there is to it... Oh this is so much simpler. There is really no point in using anything else. Just need to remember to compare f->f_op to what we expect to make sure that it is indeed the same device class.
On Thu, Jul 25, 2019 at 09:29:40PM -0700, Sagi Grimberg wrote: > > > > > > > > > NVMe-OF is configured using configfs. The target is specified by the > > > > > > > > user writing a path to a configfs attribute. This is the way it works > > > > > > > > today but with blkdev_get_by_path()[1]. For the passthru code, we need > > > > > > > > to get a nvme_ctrl instead of a block_device, but the principal is the same. > > > > > > > > > > > > > > Why isn't a fd being passed in there instead of a random string? > > > > > > > > > > > > I suppose we could echo a string of the file descriptor number there, > > > > > > and look up the fd in the process' file descriptor table ... > > > > > > > > > > Assuming that there is a open handle somewhere out there... > > > > > > Yes, that would be a step backwards from an interface. The user would > > > then need a special process to open the fd and pass it through configfs. > > > They couldn't just do it with basic bash commands. > > > > First of all, they can, but... WTF not have filp_open() done right there? > > Yes, by name. With permission checks done. And pick your object from the > > sodding struct file you'll get. > > > > What's the problem? Why do you need cdev lookups, etc., when you are > > dealing with files under your full control? Just open them and use > > ->private_data or whatever you set in ->open() to access the damn thing. > > All there is to it... > Oh this is so much simpler. There is really no point in using anything > else. Just need to remember to compare f->f_op to what we expect to make > sure that it is indeed the same device class. That is good, that solves the "/dev/random/" issue I was talking about earlier as well. Odds are you all can do the same for the blockdev interface as well. thanks, greg k-h
On 2019-07-25 10:29 p.m., Sagi Grimberg wrote: > >>>>>>>> NVMe-OF is configured using configfs. The target is specified by >>>>>>>> the >>>>>>>> user writing a path to a configfs attribute. This is the way it >>>>>>>> works >>>>>>>> today but with blkdev_get_by_path()[1]. For the passthru code, >>>>>>>> we need >>>>>>>> to get a nvme_ctrl instead of a block_device, but the principal >>>>>>>> is the same. >>>>>>> >>>>>>> Why isn't a fd being passed in there instead of a random string? >>>>>> >>>>>> I suppose we could echo a string of the file descriptor number there, >>>>>> and look up the fd in the process' file descriptor table ... >>>>> >>>>> Assuming that there is a open handle somewhere out there... >>> >>> Yes, that would be a step backwards from an interface. The user would >>> then need a special process to open the fd and pass it through configfs. >>> They couldn't just do it with basic bash commands. >> >> First of all, they can, but... WTF not have filp_open() done right there? >> Yes, by name. With permission checks done. And pick your object from >> the >> sodding struct file you'll get. >> >> What's the problem? Why do you need cdev lookups, etc., when you are >> dealing with files under your full control? Just open them and use >> ->private_data or whatever you set in ->open() to access the damn thing. >> All there is to it... > Oh this is so much simpler. There is really no point in using anything > else. Just need to remember to compare f->f_op to what we expect to make > sure that it is indeed the same device class. Yes, that sounds like a good idea. I'll do this for v2. Thanks, Logan
diff --git a/fs/char_dev.c b/fs/char_dev.c index 5cc53bd5f654..25037d642ff8 100644 --- a/fs/char_dev.c +++ b/fs/char_dev.c @@ -22,6 +22,7 @@ #include <linux/mutex.h> #include <linux/backing-dev.h> #include <linux/tty.h> +#include <linux/namei.h> #include "internal.h" @@ -403,6 +404,38 @@ static struct cdev *cdev_lookup(struct inode *inode) return p; } +struct cdev *cdev_get_by_path(const char *pathname) +{ + struct inode *inode; + struct cdev *cdev; + struct path path; + int error; + + if (!pathname || !*pathname) + return ERR_PTR(-EINVAL); + + error = kern_path(pathname, LOOKUP_FOLLOW, &path); + if (error) + return ERR_PTR(error); + + if (!may_open_dev(&path)) { + cdev = ERR_PTR(-EACCES); + goto out; + } + + inode = d_backing_inode(path.dentry); + if (!S_ISCHR(inode->i_mode)) { + cdev = ERR_PTR(-EINVAL); + goto out; + } + + cdev = cdev_lookup(inode); + +out: + path_put(&path); + return cdev; +} + /* * Called every time a character special file is opened */ @@ -690,5 +723,6 @@ EXPORT_SYMBOL(cdev_add); EXPORT_SYMBOL(cdev_set_parent); EXPORT_SYMBOL(cdev_device_add); EXPORT_SYMBOL(cdev_device_del); +EXPORT_SYMBOL(cdev_get_by_path); EXPORT_SYMBOL(__register_chrdev); EXPORT_SYMBOL(__unregister_chrdev); diff --git a/include/linux/cdev.h b/include/linux/cdev.h index 0e8cd6293deb..c7f2df2ca69a 100644 --- a/include/linux/cdev.h +++ b/include/linux/cdev.h @@ -24,6 +24,7 @@ void cdev_init(struct cdev *, const struct file_operations *); struct cdev *cdev_alloc(void); +struct cdev *cdev_get_by_path(const char *pathname); void cdev_put(struct cdev *p); int cdev_add(struct cdev *, dev_t, unsigned);
cdev_get_by_path() attempts to retrieve a struct cdev from a path name. It is analagous to blkdev_get_by_path(). This will be necessary to create a nvme_ctrl_get_by_path()to support NVMe-OF passthru. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> --- fs/char_dev.c | 34 ++++++++++++++++++++++++++++++++++ include/linux/cdev.h | 1 + 2 files changed, 35 insertions(+)