Message ID | 1541867733-7836-1-git-send-email-pbonzini@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | SG_IO command filtering via sysfs | expand |
I wonder if a better way of adding SG_IO command filtering is via eBPF? We are currently carrying a inside Google a patch which allows a specific of SCSI commands to non-root processes --- if the process belonged to a particular Unix group id. It's pretty specific to our use case, in terms of the specific SCSI commands we want to allow through. I can imagine people wanting different filters based on the type of the SCSI device, or a HDD's WWID, not just a group id. For example, this might be useful for people wanting to do crazy things with containers --- maybe you'd want to allow container root to send a SANITIZE ERASE command to one of its exclusively assigned disks, but not to other HDD's. So having something that's more general than a flat file in sysfs might be preferable to resurrecting an interface which we would then after to support forever, even if we come up with a more general interface. - Ted
I think this goes in the wrong way. There isn't really any point in filtering at all if we have access to the whole device by the file persmissions, and we generally should not allow any access for partitions. I think we need to simplify the selection, not add crazy amounts of special case code.
On 10/11/2018 20:05, Theodore Y. Ts'o wrote: > I wonder if a better way of adding SG_IO command filtering is via > eBPF? We are currently carrying a inside Google a patch which allows > a specific of SCSI commands to non-root processes --- if the process > belonged to a particular Unix group id. > > It's pretty specific to our use case, in terms of the specific SCSI > commands we want to allow through. I can imagine people wanting > different filters based on the type of the SCSI device, or a HDD's > WWID, not just a group id. For example, this might be useful for > people wanting to do crazy things with containers --- maybe you'd > want to allow container root to send a SANITIZE ERASE command to one > of its exclusively assigned disks, but not to other HDD's. > > So having something that's more general than a flat file in sysfs > might be preferable to resurrecting an interface which we would then > after to support forever, even if we come up with a more general > interface. Heh, this was exactly the answer I dreaded, because I can't deny it makes sense. :) My main argument against it is that while superseding an interface and still having to support it forever sucks, having a super-complex interface is also bad (back in 2012 I wrote https://lwn.net/Articles/501742/ which I'm not particularly enthusiastic about). In many cases a combination of MAC policies, ACLs, etc. can be just as effective. I'm not very eBPF savvy, the question I have is: what kind of information about the running process is available in an eBPF program? For example, even considering only the examples you make, would it be able to access the CDB, the capabilities and uid/gid of the task, the SCSI device type, the WWN? Of course you also need the mode of the file descriptor in order to allow SANITIZE ERASE if the disk is opened for write. Paolo
On Sun, Nov 11, 2018 at 05:14:45AM -0800, Christoph Hellwig wrote: > I think this goes in the wrong way. There isn't really any point > in filtering at all if we have access to the whole device by the > file persmissions, and we generally should not allow any access for > partitions. It really depends on the security model being used on a particular system. I can easily imagine scenarios where userspace is allowed full access to the device with respect to read/write/open, but the security model doesn't want to allow access to various SCSI commands such as firmware upload commands, TCG commads, the soon-to-be-standardized Zone Activation Commands (which allow dynamic conversion of HDD recording modes between CMR and SMR), etc. And this is before we get to crazy container / namespace scenarios. And *no*, let's not have a SG_IO namespace! :-) > I think we need to simplify the selection, not add crazy amounts of > special case code. I have the opposite opinions in terms of wanting more complex filtering rules, but I also agree that special case C code is not the answer --- and why I suggested that eBPF filtering rules is the right way to go. - Ted
On Sun, Nov 11, 2018 at 02:26:45PM +0100, Paolo Bonzini wrote: > > I'm not very eBPF savvy, the question I have is: what kind of > information about the running process is available in an eBPF program? > For example, even considering only the examples you make, would it be > able to access the CDB, the capabilities and uid/gid of the task, the > SCSI device type, the WWN? Of course you also need the mode of the file > descriptor in order to allow SANITIZE ERASE if the disk is opened for write. The basic uid/gid of the task is certainly available. For storage stack specific things, it's a matter of what we make available to the eBPF function. For example, there is an experimental netfilter replacement which uses eBPF; obviously that requires making the packet which is being inspecting so it can be given a thumbs up or thumbs down result. That's going to require letting the eBPF function to have access to the network header, access to the connection tracking tables, etc. You can basically think of it as passing arguments to a function except it's it's written in eBPF instead of C. You can also define eBPF functions implemented in C which can be called from eBPF code. - Ted
On Sun, Nov 11, 2018 at 08:42:42AM -0500, Theodore Y. Ts'o wrote: > It really depends on the security model being used on a particular > system. I can easily imagine scenarios where userspace is allowed > full access to the device with respect to read/write/open, but the > security model doesn't want to allow access to various SCSI commands > such as firmware upload commands, TCG commads, the > soon-to-be-standardized Zone Activation Commands (which allow dynamic > conversion of HDD recording modes between CMR and SMR), etc. Well, that's what we have the security_file_ioctl() LSM hook for so that your security model can arbitrate access to ioctls.
On 12/11/2018 09:20, Christoph Hellwig wrote: > On Sun, Nov 11, 2018 at 08:42:42AM -0500, Theodore Y. Ts'o wrote: >> It really depends on the security model being used on a particular >> system. I can easily imagine scenarios where userspace is allowed >> full access to the device with respect to read/write/open, but the >> security model doesn't want to allow access to various SCSI commands >> such as firmware upload commands, TCG commads, the >> soon-to-be-standardized Zone Activation Commands (which allow dynamic >> conversion of HDD recording modes between CMR and SMR), etc. > > Well, that's what we have the security_file_ioctl() LSM hook for so that > your security model can arbitrate access to ioctls. Doesn't that have TOC-TOU races by design? Also, what about SG_IO giving write access to files that are only opened read-only (and only have read permissions)? Paolo
On 11/11/2018 15:14, Theodore Y. Ts'o wrote: > On Sun, Nov 11, 2018 at 02:26:45PM +0100, Paolo Bonzini wrote: >> >> I'm not very eBPF savvy, the question I have is: what kind of >> information about the running process is available in an eBPF program? >> For example, even considering only the examples you make, would it be >> able to access the CDB, the capabilities and uid/gid of the task, the >> SCSI device type, the WWN? Of course you also need the mode of the file >> descriptor in order to allow SANITIZE ERASE if the disk is opened for write. > > The basic uid/gid of the task is certainly available. For storage > stack specific things, it's a matter of what we make available to the > eBPF function. For example, there is an experimental netfilter > replacement which uses eBPF; obviously that requires making the packet > which is being inspecting so it can be given a thumbs up or thumbs > down result. That's going to require letting the eBPF function to > have access to the network header, access to the connection tracking > tables, etc. Yeah, and there are even already helpers such as bpf_get_current_uid_gid. So that part can be done in a sort-of generic way. I can try and do the work, but I'd like some agreement on the design first... For example a more important question is how would the BPF filter be attached? Two possibilities that come to mind are: - add it to the /dev/sg* or /dev/sd* struct file(*) via a ioctl, and use pass the file descriptor to the unprivileged QEMU after setting up the BPF filter, via either fork() or SCM_RIGHTS. This would be a very nice model for privilege separation, but I'm afraid it would not work for your use case - add BPF programs to cgroups, in the form of a new BPF_PROG_TYPE_CGROUP_CDB_FILTER or something like that. That would also work for my usecase, and it seems to be in line with how the network guys are doing things. So it would seem like the way to go. Some other details... Registering the first cgroup-based filter would disable the default filter; if multiple filters are attached, the outcomes of all filters would be AND-ed, also similarly to how socket and sockaddr cgroup BPF works. Finally, filters would be applied also to processes with CAP_SYS_RAWIO, unlike the current filter. Needless to say, this would not add special case code, but it would still add a substantial amount of code, probably comparable to this series. Christoph? Paolo (*) that's not immediate for /dev/sd*, because it would require using the block device file's private_data; that's not possible yet via struct block_device_operations, but as far as I can tell block devices themselves do not need the private_data, so it is at least doable.
On Fri, 2018-11-16 at 01:26 +0100, Paolo Bonzini wrote: > Yeah, and there are even already helpers such as > bpf_get_current_uid_gid. So that part can be done in a sort-of generic way. > > I can try and do the work, but I'd like some agreement on the design > first... For example a more important question is how would the BPF > filter be attached? Two possibilities that come to mind are: > > - add it to the /dev/sg* or /dev/sd* struct file(*) via a ioctl, and use > pass the file descriptor to the unprivileged QEMU after setting up the > BPF filter, via either fork() or SCM_RIGHTS. This would be a very nice > model for privilege separation, but I'm afraid it would not work for > your use case > > - add BPF programs to cgroups, in the form of a new > BPF_PROG_TYPE_CGROUP_CDB_FILTER or something like that. That would also > work for my usecase, and it seems to be in line with how the network > guys are doing things. So it would seem like the way to go. > > Some other details... Registering the first cgroup-based filter would > disable the default filter; if multiple filters are attached, the > outcomes of all filters would be AND-ed, also similarly to how socket > and sockaddr cgroup BPF works. Finally, filters would be applied also > to processes with CAP_SYS_RAWIO, unlike the current filter. > > Needless to say, this would not add special case code, but it would > still add a substantial amount of code, probably comparable to this series. All user space interfaces in the Linux kernel for storage that I'm familiar with not only allow configuration of parameters but also make it easy to query which parameters have been configured. The existing sysfs and configfs interfaces demonstrate this. Using BPF to configure SG/IO access has a significant disadvantage, namely that it is very hard to figure out what has been configured. Figuring out what has been configured namely requires disassembling BPF. I'm not sure anyone will be enthusiast about this. Bart.
On 16/11/18 01:37, Bart Van Assche wrote: > All user space interfaces in the Linux kernel for storage that I'm familiar > with not only allow configuration of parameters but also make it easy to > query which parameters have been configured. The existing sysfs and configfs > interfaces demonstrate this. Using BPF to configure SG/IO access has a > significant disadvantage, namely that it is very hard to figure out what has > been configured. Figuring out what has been configured namely requires > disassembling BPF. I'm not sure anyone will be enthusiast about this. Well, that's a problem with BPF in general. With great power comes great obscurability. Paolo
On Mon, Nov 12, 2018 at 11:17:29AM +0100, Paolo Bonzini wrote: > > Well, that's what we have the security_file_ioctl() LSM hook for so that > > your security model can arbitrate access to ioctls. > > Doesn't that have TOC-TOU races by design? If you want to look at the command - yes. If you just want to filter read vs write vs ioctl, no. > Also, what about SG_IO giving write access to files that are only opened > read-only (and only have read permissions)? Allowing SG_IO on read-only permissions sounds like a reall bad idea, filtering or not.
On 16/11/18 10:32, Christoph Hellwig wrote: > On Mon, Nov 12, 2018 at 11:17:29AM +0100, Paolo Bonzini wrote: >>> Well, that's what we have the security_file_ioctl() LSM hook for so that >>> your security model can arbitrate access to ioctls. >> >> Doesn't that have TOC-TOU races by design? > > If you want to look at the command - yes. If you just want to filter > read vs write vs ioctl, no. Yeah, but looking at the command is what Ted wants. The thing that we did in RHEL was a single sysfs bool that allows unfiltered access, because it was sort of enough and made the delta very small. But for upstream I want to do it right, even if that means learning all that new-fangled BPF stuff. :) >> Also, what about SG_IO giving write access to files that are only opened >> read-only (and only have read permissions)? > > Allowing SG_IO on read-only permissions sounds like a reall bad idea, > filtering or not. I would even agree, however it's allowed right now and I would be surprised if no one was relying on it in good faith ("I'm just doing an INQUIRY, why do I need to open O_RDWR"). And indeed: $ sudo chmod a+r /dev/sda $ strace -e openat sg_inq /dev/sda openat(AT_FDCWD, "/dev/sda", O_RDONLY|O_NONBLOCK) = 3 ^^^^^^^^ So it would be a regression. Paolo
On Fri, Nov 16, 2018 at 10:45:11AM +0100, Paolo Bonzini wrote: > Yeah, but looking at the command is what Ted wants. The thing that we > did in RHEL was a single sysfs bool that allows unfiltered access, > because it was sort of enough and made the delta very small. But for > upstream I want to do it right, even if that means learning all that > new-fangled BPF stuff. :) So what is this magic command? > I would even agree, however it's allowed right now and I would be > surprised if no one was relying on it in good faith ("I'm just doing an > INQUIRY, why do I need to open O_RDWR"). And indeed: > > $ sudo chmod a+r /dev/sda > $ strace -e openat sg_inq /dev/sda > openat(AT_FDCWD, "/dev/sda", O_RDONLY|O_NONBLOCK) = 3 Well, not if we only did that for unprivileged opens.
On Fri, Nov 16, 2018 at 08:01:29AM +0100, Paolo Bonzini wrote: > On 16/11/18 01:37, Bart Van Assche wrote: > > All user space interfaces in the Linux kernel for storage that I'm familiar > > with not only allow configuration of parameters but also make it easy to > > query which parameters have been configured. The existing sysfs and configfs > > interfaces demonstrate this. Using BPF to configure SG/IO access has a > > significant disadvantage, namely that it is very hard to figure out what has > > been configured. Figuring out what has been configured namely requires > > disassembling BPF. I'm not sure anyone will be enthusiast about this. > > Well, that's a problem with BPF in general. With great power comes > great obscurability. You can also make the same complaint about kernel modules; that it's impossible to figure exactly what a kernel modules does without disassembling them. However, you can a one-line description of what it does using modinfo: % modinfo async_pq filename: /lib/modules/4.19.0-00022-g831156939ae8/kernel/crypto/async_tx/async_pq.ko license: GPL description: asynchronous raid6 syndrome generation/validation srcversion: 529102C736C4FED181C15A8 depends: raid6_pq,async_tx,async_xor retpoline: Y intree: Y name: async_pq vermagic: 4.19.0-00022-g831156939ae8 SMP mod_unload modversions - Ted
On Fri, Nov 16, 2018 at 10:45:11AM +0100, Paolo Bonzini wrote: > On 16/11/18 10:32, Christoph Hellwig wrote: > > On Mon, Nov 12, 2018 at 11:17:29AM +0100, Paolo Bonzini wrote: > >>> Well, that's what we have the security_file_ioctl() LSM hook for so that > >>> your security model can arbitrate access to ioctls. > >> > >> Doesn't that have TOC-TOU races by design? > > > > If you want to look at the command - yes. If you just want to filter > > read vs write vs ioctl, no. > > Yeah, but looking at the command is what Ted wants. The thing that we > did in RHEL was a single sysfs bool that allows unfiltered access, > because it was sort of enough and made the delta very small. But for > upstream I want to do it right, even if that means learning all that > new-fangled BPF stuff. :) I'd argue that a purpose-built eBPF access control facility is superior to the security_file_ioctl() LSM hook because it can make available to the authorization function access to the cached results of the SCSI INQUIRY command, and it avoids needing to duplicate knowledge of how to parse the parameters of the SG_IO ioctl in the LSM module as well as in the SCSI stack. Just because you *could* implement anything in terms of a turing machine tape doesn't mean that it is good idea. Similarly, just because you *can* implement something as an LSM hook doesn't mean that it's the best design. > >> Also, what about SG_IO giving write access to files that are only opened > >> read-only (and only have read permissions)? > > > > Allowing SG_IO on read-only permissions sounds like a reall bad idea, > > filtering or not. > > I would even agree, however it's allowed right now and I would be > surprised if no one was relying on it in good faith ("I'm just doing an > INQUIRY, why do I need to open O_RDWR"). And indeed: > > $ sudo chmod a+r /dev/sda > $ strace -e openat sg_inq /dev/sda > openat(AT_FDCWD, "/dev/sda", O_RDONLY|O_NONBLOCK) = 3 > ^^^^^^^^ > > So it would be a regression. Ugh, that's... unfortunate. I suppose we could try to figure out all of the SCSI commands that would have to be white-listed to be allowed using O_RDONLY from historical usage, but that would be a huge job, and it's highly likely we would miss some anyway. OTOH, this could be called a security botch that should be fixed, and if we make a best effort to white list all of the innocuous cases such as SCSI INQUIRY, maybe that would be acceptible. - Ted
On Fri, 2018-11-16 at 12:43 -0500, Theodore Y. Ts'o wrote: > I'd argue that a purpose-built eBPF access control facility is > superior to the security_file_ioctl() LSM hook because it can make > available to the authorization function access to the cached results > of the SCSI INQUIRY command, and it avoids needing to duplicate > knowledge of how to parse the parameters of the SG_IO ioctl in the LSM > module as well as in the SCSI stack. If an eBPF program would decide which SG_IO commands will be executed and which ones not, does that mean that a SCSI parser would have to be implemented in eBPF? If so, does that mean that both the eBPF and the LSM approach share the disadvantage of requiring to do SCSI CDB parsing outside the SCSI core? Bart.
On 16/11/18 19:17, Bart Van Assche wrote: > On Fri, 2018-11-16 at 12:43 -0500, Theodore Y. Ts'o wrote: >> I'd argue that a purpose-built eBPF access control facility is >> superior to the security_file_ioctl() LSM hook because it can make >> available to the authorization function access to the cached results >> of the SCSI INQUIRY command, and it avoids needing to duplicate >> knowledge of how to parse the parameters of the SG_IO ioctl in the LSM >> module as well as in the SCSI stack. > > If an eBPF program would decide which SG_IO commands will be executed > and which ones not, does that mean that a SCSI parser would have to be > implemented in eBPF? If so, does that mean that both the eBPF and the > LSM approach share the disadvantage of requiring to do SCSI CDB parsing > outside the SCSI core? The LSM approach cannot do SCSI CDB parsing, unless you add a special SCSI-specific hook called after parsing the SG_IO argument, due to race conditions. I'd rather not do that, however it would have that disadvantage indeed. The eBPF approach pushes the policy and the parsing entirely to userspace, so I'm not sure you can say it's a disadvantage. It's just a different design. If you use SG_IO you're already in for writing userspace code that handles CDBs, sense data etc. Paolo