Message ID | 1452787318-29610-4-git-send-email-stefanb@us.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jan 14, 2016 at 11:01:57AM -0500, Stefan Berger wrote: > + pdev = platform_device_register_simple("tpm_vtpm", vtpm_dev->dev_num, > + NULL, 0); This seems strange, something like this should not be creating platform_devices. tpm's cannot be attached to platform_devices outside their probe functions, I just fixed that bug in tpm_tis. Attach the tpm to a device on the vtpm_class? Do it through a callback that takes care of devm? IIRC there is one but the name escapes me now :) > + vtpm_dev->chip = tpmm_chip_alloc_pattern(vtpm_dev->pdev, > + &vtpm_tpm_ops, > + VTPM_DEV_PREFIX_CLIENT"%d"); I agree with Jarkko I don't see why these deserve special names.. Presumably some namespace magic can be used to make them show up as tpm0 in a container? > + vtpm_dev->chip->flags |= TPM_CHIP_FLAG_NO_SYSFS | TPM_CHIP_FLAG_NO_LOG; Why no sysfs files? > + switch (ioctl) { > + case VTPM_NEW_DEV: > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; > + vtpm_new_pair_p = argp; > + if (copy_from_user(&vtpm_new_pair, vtpm_new_pair_p, > + sizeof(vtpm_new_pair))) > + return -EFAULT; > + vtpm_dev = vtpm_create_device_pair(&vtpm_new_pair); > + if (IS_ERR(vtpm_dev)) > + return PTR_ERR(vtpm_dev); > + if (copy_to_user(vtpm_new_pair_p, &vtpm_new_pair, > + sizeof(vtpm_new_pair))) > + return -EFAULT; Another way to handle this is to return the FD for the server side of the new TPM here and then route accesses from the /dev/tpmX side to that FD. Avoid the extra char device. There is some kernel infrasturcture that helpd do this (anon fd or something, IIRC) For this usage that might make more sense than trying to copy the ptmx scheme, which is like that for other reasons.. > + case VTPM_DEL_DEV: > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; Then del becomes close() on the special fd > + case VTPM_GET_VTPMDEV: And this wouldn't be needed Jason ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 01/19/2016 06:51:07 PM: > > On Thu, Jan 14, 2016 at 11:01:57AM -0500, Stefan Berger wrote: > > + pdev = platform_device_register_simple("tpm_vtpm", vtpm_dev->dev_num, > > + NULL, 0); > > This seems strange, something like this should not be creating > platform_devices. Should it be a misc_device then ? > > tpm's cannot be attached to platform_devices outside their probe > functions, I just fixed that bug in tpm_tis. > > Attach the tpm to a device on the vtpm_class? Do it through a callback > that takes care of devm? IIRC there is one but the name escapes me now > :) > > > + vtpm_dev->chip = tpmm_chip_alloc_pattern(vtpm_dev->pdev, > > + &vtpm_tpm_ops, > > + VTPM_DEV_PREFIX_CLIENT"%d"); > > I agree with Jarkko I don't see why these deserve special names.. I did that because it becomes confusing when there are many /dev/tpm%d type of devices. This way we can distinguish between a hardware /dev/tpm%d and all these types of devices. > > Presumably some namespace magic can be used to make them show up as > tpm0 in a container? The magic is to have the container management stack create the device pair. From the ioctl it learns the name of the devices that were created and it then finds out about the major/minor number of the created device and have /dev/tpm0 with that major/minor created in the container's /dev directory. > > > + vtpm_dev->chip->flags |= TPM_CHIP_FLAG_NO_SYSFS | TPM_CHIP_FLAG_NO_LOG; > > Why no sysfs files? I have changed this now to the following + if (vtpm_dev->flags & VTPM_FLAG_NO_SYSFS) + vtpm_dev->chip->flags |= TPM_CHIP_FLAG_NO_SYSFS; + + if (vtpm_dev->flags & VTPM_FLAG_NO_LOG) + vtpm_dev->chip->flags |= TPM_CHIP_FLAG_NO_LOG; + https://github.com/stefanberger/linux/commit/5dd68fef872f10261521c91c23b1d0dfbdf69996#diff-0d6b7dbf28fbfdd13909c2e1b5a6d7a3R541 The issue with sysfs is that sysfs is showing directory entries for all vTPMs in all containers. If we don't want users to see the PCRs of other containers we set this flag. The side effect of setting this flag is that the container will not be able to see its own TPM device, either, but presumably the user has commands to reads PCRs etc. Regarding VTPM_FLAG_NO_LOG: Container's boot/start without firmware that would write an ACPI log. So in most cases one will choose the VTPM_FLAG_NO_LOG flag since there is no log. Maybe that's a flag the user shouldn't have control over. > > > + switch (ioctl) { > > + case VTPM_NEW_DEV: > > + if (!capable(CAP_SYS_ADMIN)) > > + return -EPERM; > > + vtpm_new_pair_p = argp; > > + if (copy_from_user(&vtpm_new_pair, vtpm_new_pair_p, > > + sizeof(vtpm_new_pair))) > > + return -EFAULT; > > + vtpm_dev = vtpm_create_device_pair(&vtpm_new_pair); > > + if (IS_ERR(vtpm_dev)) > > + return PTR_ERR(vtpm_dev); > > + if (copy_to_user(vtpm_new_pair_p, &vtpm_new_pair, > > + sizeof(vtpm_new_pair))) > > + return -EFAULT; > > Another way to handle this is to return the FD for the server side of > the new TPM here and then route accesses from the /dev/tpmX side to > that FD. Avoid the extra char device. There is some kernel > infrasturcture that helpd do this (anon fd or something, IIRC) Hm, I would prefer to keep the server-side char device since that way one isn't tied to fd passing or so and the TPM could terminate and be restarted later on on top of that server side device while keeping the client side device usable beyond the server side shutting down. > > For this usage that might make more sense than trying to copy the ptmx > scheme, which is like that for other reasons.. :-) Yes, ptmx is where I took this from. > > > + case VTPM_DEL_DEV: > > + if (!capable(CAP_SYS_ADMIN)) > > + return -EPERM; > > Then del becomes close() on the special fd And with that the client side device would *have* to close as well because the backend now is inaccessible? > > > + case VTPM_GET_VTPMDEV: > > And this wouldn't be needed Assuming we want to keep server side for longer time, this ioctl and VTPM_GET_TPMDEV help to determine the other side of device that found in /dev. Thanks for looking at the patches. I would prefer to keep the two character devices rather than the server side only being represented with an fd. Stefan > > Jason > ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Wed, Jan 20, 2016 at 09:39:09AM -0500, Stefan Berger wrote: > Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 01/19/2016 > 06:51:07 PM: > > > > On Thu, Jan 14, 2016 at 11:01:57AM -0500, Stefan Berger wrote: > > > + pdev = platform_device_register_simple("tpm_vtpm", vtpm_dev->dev_num, > > > + NULL, 0); > > > > This seems strange, something like this should not be creating > > platform_devices. > Should it be a misc_device then ? No. Check what other virtual devices are doing.. Generally speaking, the tpm core should not insist on a parent device the way it does. That is an artifact of the clean up I did of the drivers that was never fully compelted. To make the driver migration easier the TPM core is using devm in places it probably shouldn't, but unwinding that requires more difficult all-driver changes. If you get stuck here you will have to fix the core code. > > > + vtpm_dev->chip = tpmm_chip_alloc_pattern(vtpm_dev->pdev, > > > + &vtpm_tpm_ops, > > > + VTPM_DEV_PREFIX_CLIENT"%d"); > > > > I agree with Jarkko I don't see why these deserve special names.. > I did that because it becomes confusing when there are many /dev/tpm%d > type of devices. This way we can distinguish between a hardware > /dev/tpm%d and all these types of devices. That isn't really consistent with how linux kernel naming works these days. Use a udev rule in userspace if you really want this. > > Presumably some namespace magic can be used to make them show up as > > tpm0 in a container? > The magic is to have the container management stack create the device > pair. From the ioctl it learns the name of the devices that were > created and it then finds out about the major/minor number of the > created device and have /dev/tpm0 with that major/minor created in the > container's /dev directory. Except that isn't good enough - the IMA kernel side doesn't know that this tpm is now acting as the 'main' 'default' TPM. > The issue with sysfs is that sysfs is showing directory entries for all > vTPMs in all containers. That is not a problem to solve here. The sysfs will also show entires for hardware tpms too. You need somekind of namespace related solution that works for all tpms. I don't know what became of it, but there used to be 'device namespaces' patch series that delt with this kind of problem. Again, I suspect strongly this will need some kind of TPM/IMA namespace to actually do what you want. > If we don't want users to see the PCRs of other containers we set > this flag. The side effect of setting this flag is that the > container will not be able to see its own TPM device, either, but > presumably the user has commands to reads PCRs etc. Erm, sysfs is just an obvious symptom, the container could also just create the dev node and access any tpm it likes, or use some future (?) IMA API with a tpm device number. > VTPM_FLAG_NO_LOG: Container's boot/start without firmware that > would write an ACPI log. So in most cases one will choose the > VTPM_FLAG_NO_LOG flag since there is no log. Maybe that's a flag > the user shouldn't have control over. ? But there is no ACPI handle so this should already be naturally defeated by the core code. Why a flag? > > Another way to handle this is to return the FD for the server side of > > the new TPM here and then route accesses from the /dev/tpmX side to > > that FD. Avoid the extra char device. There is some kernel > > infrasturcture that helpd do this (anon fd or something, IIRC) > Hm, I would prefer to keep the server-side char device since that way > one isn't tied to fd passing or so and the TPM could terminate and be > restarted later on on top of that server side device while keeping the > client side device usable beyond the server side shutting down. Don't be afraid of FD passing. vTPM daemon restart isn't really feasible anyhow as TPMs are stateful. The kernel is not prepared to survive a HW TPM reboot. If that problem were to be solved somehow then a scheme like systemd's FDSTORE=1 to keep the fd alive during a daemon crash could be used. The huge downside to using a master side dev node is that these things will leak. Killing the vtpm daemon will not clean up the slave interface and another command and sysadmin interaction will be needed to sort out the inevitable mess. > > Then del becomes close() on the special fd > And with that the client side device would *have* to close as well > because the backend now is inaccessible? The client side would see failures on all TPM commands it tries to Jason ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 01/20/2016 08:17:01 PM: > > On Wed, Jan 20, 2016 at 09:39:09AM -0500, Stefan Berger wrote: > > Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 01/19/2016 > > 06:51:07 PM: > > > > > > On Thu, Jan 14, 2016 at 11:01:57AM -0500, Stefan Berger wrote: > > > > + pdev = platform_device_register_simple("tpm_vtpm", > vtpm_dev->dev_num, > > > > + NULL, 0); > > > > > > This seems strange, something like this should not be creating > > > platform_devices. > > Should it be a misc_device then ? > > No. Check what other virtual devices are doing.. register_chrdev maybe? > > Generally speaking, the tpm core should not insist on a parent device > the way it does. That is an artifact of the clean up I did of the > drivers that was never fully compelted. To make the driver migration > easier the TPM core is using devm in places it probably shouldn't, but > unwinding that requires more difficult all-driver changes. > > If you get stuck here you will have to fix the core code. > > > > > + vtpm_dev->chip = tpmm_chip_alloc_pattern(vtpm_dev->pdev, > > > > + &vtpm_tpm_ops, > > > > + VTPM_DEV_PREFIX_CLIENT"%d"); > > > > > > I agree with Jarkko I don't see why these deserve special names.. > > I did that because it becomes confusing when there are many /dev/tpm%d > > type of devices. This way we can distinguish between a hardware > > /dev/tpm%d and all these types of devices. > > That isn't really consistent with how linux kernel naming works these > days. Use a udev rule in userspace if you really want this. Ok, then I will just try to call them /dev/tpm%d. > > > > Presumably some namespace magic can be used to make them show up as > > > tpm0 in a container? > > The magic is to have the container management stack create the device > > pair. From the ioctl it learns the name of the devices that were > > created and it then finds out about the major/minor number of the > > created device and have /dev/tpm0 with that major/minor created in the > > container's /dev directory. > > Except that isn't good enough - the IMA kernel side doesn't know that this > tpm is now acting as the 'main' 'default' TPM. Hooking the vTPM to IMA requires another patch that I haven't shown since IMA namespacing isn't public yet. Basically we implement another ioctl() that is to be called before the clone() in order to 'reserve' a vtpm device pair for the calling process. During the clone() call IMA namespacing code can query the vTPM driver for a 'reserved' device pair. Hooking IMA up after the clone() may also work but in case of docker/golang it's better to do this before since the language libraries do a lot after the clone automatically. > > > The issue with sysfs is that sysfs is showing directory entries for all > > vTPMs in all containers. > > That is not a problem to solve here. The sysfs will also show entires > for hardware tpms too. You need somekind of namespace related solution > that works for all tpms. > > I don't know what became of it, but there used to be 'device > namespaces' patch series that delt with this kind of problem. Would be good to have now since I for me the best way is to work around the problem entirely -- not registering with sysfs. > > Again, I suspect strongly this will need some kind of TPM/IMA > namespace to actually do what you want. So here things aren't so clear to me, either. Sysfs should really only show the devices that are relevant to a container, but it seems to show a lot more than that, serial bus for example, i8042 etc. , that don't necessarily have relevance inside the container. So if there's a hardware TPM on the system, it probably shouldn't show the sysfs entries in the container. If we add a vTPM for the container, it would ideally show up as tpm0 inside the container's sysfs entries, but it doesn't do that unless this is the first device created and there's no hardware TPM. So while all of this isn't possible with sysfs, I made the choice to hide the device from sysfs entirely. If hw tpm0 shows, fine, we can report BIOS measurements then. It seems like device subsystem namespacing would be needed to realize part of this so that multiple tpm0s could exist, one in each device namespace. > > > If we don't want users to see the PCRs of other containers we set > > this flag. The side effect of setting this flag is that the > > container will not be able to see its own TPM device, either, but > > presumably the user has commands to reads PCRs etc. > > Erm, sysfs is just an obvious symptom, the container could also just > create the dev node and access any tpm it likes, or use some future > (?) IMA API with a tpm device number. No, the device cgroup prevents that. Only the major/minor number of its dedicated TPM (client) chardev can be accessed. > > > VTPM_FLAG_NO_LOG: Container's boot/start without firmware that > > would write an ACPI log. So in most cases one will choose the > > VTPM_FLAG_NO_LOG flag since there is no log. Maybe that's a flag > > the user shouldn't have control over. > > ? But there is no ACPI handle so this should already be naturally > defeated by the core code. Why a flag? > > > > Another way to handle this is to return the FD for the server side of > > > the new TPM here and then route accesses from the /dev/tpmX side to > > > that FD. Avoid the extra char device. There is some kernel > > > infrasturcture that helpd do this (anon fd or something, IIRC) > > > Hm, I would prefer to keep the server-side char device since that way > > one isn't tied to fd passing or so and the TPM could terminate and be > > restarted later on on top of that server side device while keeping the > > client side device usable beyond the server side shutting down. > > Don't be afraid of FD passing. > > vTPM daemon restart isn't really feasible anyhow as TPMs are > stateful. The kernel is not prepared to survive a HW TPM reboot. If > that problem were to be solved somehow then a scheme like systemd's > FDSTORE=1 to keep the fd alive during a daemon crash could be used. > > The huge downside to using a master side dev node is that these things > will leak. Killing the vtpm daemon will not clean up the slave > interface and another command and sysadmin interaction will be needed > to sort out the inevitable mess. This is a scenario that works, though. The server side opened /dev/vtpms1. It closed it and the clients side disappeared. Both devices /dev/vtpms1 and /dev/vtpmc1 were removed. The client side then shows the following: # ls -l /proc/self/fd total 0 lrwx------. 1 root root 64 Jan 20 21:34 0 -> /dev/pts/1 lrwx------. 1 root root 64 Jan 20 21:34 1 -> /dev/pts/1 lrwx------. 1 root root 64 Jan 20 21:34 100 -> /dev/vtpmc1 (deleted) lrwx------. 1 root root 64 Jan 20 21:34 2 -> /dev/pts/1 lr-x------. 1 root root 64 Jan 20 21:34 3 -> /proc/23266/fd # echo -en '\x00\xc1\x00\x00\x00\x0a\x00\x00\x00\x00' >&100 -bash: echo: write error: Input/output error Once the client did a 'exec 100>&-', so closed the fd, the /dev/vtpmc0 could actually reappear. There's a 'keepalive' flag in this vtpm driver that keeps the device pair around even if the server side shuts down. If that had been set, /dev/vtpmc1 wouldn't have been deleted upon server side closure. Stefan > > > > Then del becomes close() on the special fd > > And with that the client side device would *have* to close as well > > because the backend now is inaccessible? > > The client side would see failures on all TPM commands it tries to > > Jason > ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Wed, Jan 20, 2016 at 10:01:38PM -0500, Stefan Berger wrote: > > On Wed, Jan 20, 2016 at 09:39:09AM -0500, Stefan Berger wrote: > > > Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 01/19/2016 > > > 06:51:07 PM: > > > > > > > > On Thu, Jan 14, 2016 at 11:01:57AM -0500, Stefan Berger wrote: > > > > > + pdev = platform_device_register_simple("tpm_vtpm", > > vtpm_dev->dev_num, > > > > > + NULL, 0); > > > > > > > > This seems strange, something like this should not be creating > > > > platform_devices. > > > Should it be a misc_device then ? > > > > No. Check what other virtual devices are doing.. > > register_chrdev maybe? ?? that doesn't replace a platform_device > > Except that isn't good enough - the IMA kernel side doesn't know that this > > tpm is now acting as the 'main' 'default' TPM. > > Hooking the vTPM to IMA requires another patch that I haven't shown since IMA > namespacing isn't public yet. Basically we implement another ioctl() that is to > be called before the clone() in order to 'reserve' a vtpm device pair for the > calling process. During the clone() call IMA namespacing code can query the > vTPM driver for a 'reserved' device pair. Hooking IMA up after the clone() may > also work but in case of docker/golang it's better to do this before since the > language libraries do a lot after the clone automatically. That sounds very complicated, wouldn't you just specify the TPM index to use in the IMA namespace when it is created? > So here things aren't so clear to me, either. Sysfs should really only show the > devices that are relevant to a container, but it seems to show a lot > more than I think what you are missing is that nobody uses mainline containers for the kind of strong isolation you are thinking about. Out-of-tree patches are used by those people and, as I understand it, they cover all these issues. So, in mainline, the correct thing to do is nothing, and realize that people who would care about pcr isolation between containers won't run mainline anyhow. Work with the out-of-tree people to make sure things work properly until they get the other bits in mainline. At least that is my impression of the state of affairs, someone more involved may know better. > > The huge downside to using a master side dev node is that these things > > will leak. Killing the vtpm daemon will not clean up the slave > > interface and another command and sysadmin interaction will be needed > > to sort out the inevitable mess. > > This is a scenario that works, though. So you already implemented the same semantics as I talked about, a clean up on close? Then just return the fd like I said. auto-delete a master char dev on close is a very strange API, don't do that. Jason ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 01/20/2016 10:21:15 PM: > > On Wed, Jan 20, 2016 at 10:01:38PM -0500, Stefan Berger wrote: > > > On Wed, Jan 20, 2016 at 09:39:09AM -0500, Stefan Berger wrote: > > > > Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 01/19/2016 > > > > 06:51:07 PM: > > > > > > > > > > On Thu, Jan 14, 2016 at 11:01:57AM -0500, Stefan Berger wrote: > > > > > > + pdev = platform_device_register_simple("tpm_vtpm", > > > vtpm_dev->dev_num, > > > > > > + NULL, 0); > > > > > > > > > > This seems strange, something like this should not be creating > > > > > platform_devices. > > > > Should it be a misc_device then ? > > > > > > No. Check what other virtual devices are doing.. > > > > register_chrdev maybe? > > ?? that doesn't replace a platform_device > > > > Except that isn't good enough - the IMA kernel side doesn't knowthat this > > > tpm is now acting as the 'main' 'default' TPM. > > > > Hooking the vTPM to IMA requires another patch that I haven't > shown since IMA > > namespacing isn't public yet. Basically we implement another ioctl > () that is to > > be called before the clone() in order to 'reserve' a vtpm device > pair for the > > calling process. During the clone() call IMA namespacing code can query the > > vTPM driver for a 'reserved' device pair. Hooking IMA up after the > clone() may > > also work but in case of docker/golang it's better to do this > before since the > > language libraries do a lot after the clone automatically. > > That sounds very complicated, wouldn't you just specify the TPM index to use > in the IMA namespace when it is created? The IMA namespace is created as part of clone(). You cannot pass anything via clone(). So you either have to do it before or immediately after. If after is too later for whatever reason, you have to do it before. > > > So here things aren't so clear to me, either. Sysfs should really > only show the > > devices that are relevant to a container, but it seems to show a lot > > more than > > I think what you are missing is that nobody uses mainline containers > for the kind of strong isolation you are thinking about. Out-of-tree > patches are used by those people and, as I understand it, they cover > all these issues. ?? Out-of-tree patches? > > So, in mainline, the correct thing to do is nothing, and realize that > people who would care about pcr isolation between containers won't run > mainline anyhow. Work with the out-of-tree people to make sure things > work properly until they get the other bits in mainline. Now one just needs to find those poeple. Basically you suggest to ignore the potential leaking between containers. Just register with sysfs ? > > At least that is my impression of the state of affairs, someone more > involved may know better. > > > > The huge downside to using a master side dev node is that these things > > > will leak. Killing the vtpm daemon will not clean up the slave > > > interface and another command and sysadmin interaction will be needed > > > to sort out the inevitable mess. > > > > This is a scenario that works, though. > > So you already implemented the same semantics as I talked about, a clean > up on close? It's part of the existing patch, yes. A flag gives a choice to keep it around, or if not set and the server close()s, the pair disappears. > > Then just return the fd like I said. Any driver that can be used as an example ? > > auto-delete a master char dev on close is a very strange API, don't do > that. What I called cleanup can be trigger by the vTPM closing /dev/vtpms%d, so the server-side. What is the master for you? /dev/vtpmx where we run the ioctls on? Stefan > > Jason > ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Wed, Jan 20, 2016 at 10:56:21PM -0500, Stefan Berger wrote: > The IMA namespace is created as part of clone(). You cannot pass anything via > clone(). So you either have to do it before or immediately after. If after is > too later for whatever reason, you have to do it before. Follow the pattern of something like the net ns. clone creates an 'empty' IMA namespace. After clone the child waits for the parent to complete its work. The parent creates a new vtpm /dev/tpm1 in it's namespace The parent does a kernel call to make tpm1 visiable in the child's IMA namespace as tpm0. (eg look at the 'ip link set .. netns ..' kind of scheme for inspiration) Hook the tpm core's id to struct tpm_chip code into the IMA namespace so that /dev/tpm0 and all other places actually access tpm1 when used in the child's IMA namespace The parent signals the child to continue to run. The child has a /dev/tpm0 that is actually routed to /dev/tpm1. The child is prevented from accessing other tpms. > > I think what you are missing is that nobody uses mainline containers > > for the kind of strong isolation you are thinking about. Out-of-tree > > patches are used by those people and, as I understand it, they cover > > all these issues. > > ?? Out-of-tree patches? Sorry, I don't recall the names of al lthe involved parties. lxr, and there is a notable company whose name eludes me right now. I had a very interesting discussion on this with James Bottomley at a conference once. Maybe look up device namespaces and look at the threads/etc on that for a clue what the status is? > Basically you suggest to ignore the potential leaking between containers. Just > register with sysfs ? Ignore in the sense that mainline doesn't have the tools to address the issue. If it is important to you then build out general capability in mainline, don't hack it like this :) Another approach would be to try and copy what netns does and link tpm visibility through a IMA namespace (as described above) directly to sysfs visibility - I have never looked at how that is done though :| > > Then just return the fd like I said. > > Any driver that can be used as an example ? The stuff in include/linux/anon_inodes.h is used to do this, eg a sequence of anon_inode_getfile, get_unused_fd_flags, fd_install will create a file descriptor in the calling process associated with a struct file_operations > > auto-delete a master char dev on close is a very strange API, don't do > > that. > > What I called cleanup can be trigger by the vTPM closing /dev/vtpms%d, so the > server-side. What is the master for you? /dev/vtpmx where we run the ioctls on? Sorry, master/slave is the ptx nomenclature for this scheme. Master would be your server side I think. Jason ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 01/21/2016 12:42:43 PM: > > On Wed, Jan 20, 2016 at 10:56:21PM -0500, Stefan Berger wrote: > > The IMA namespace is created as part of clone(). You cannot pass > anything via > > clone(). So you either have to do it before or immediately after. > If after is > > too later for whatever reason, you have to do it before. > > Follow the pattern of something like the net ns. > > clone creates an 'empty' IMA namespace. After clone the child waits > for the parent to complete its work. > > The parent creates a new vtpm /dev/tpm1 in it's namespace > > The parent does a kernel call to make tpm1 visiable in the child's > IMA namespace as tpm0. (eg look at the 'ip link set .. netns ..' kind > of scheme for inspiration) What is IMA namespace in relation to a device's name? The method is to read the major/minor numbers on the host and created /dev/tpm0 with the same major/minor numbers in the container's filesystem. The name doesn't matter I guess, but major/minor are important. > > Hook the tpm core's id to struct tpm_chip code into the IMA namespace > so that /dev/tpm0 and all other places actually access tpm1 when used > in the child's IMA namespace > > The parent signals the child to continue to run. The child has a > /dev/tpm0 that is actually routed to /dev/tpm1. The child is prevented > from accessing other tpms. The problem I have run into in particular with Docker and golang is that Docker invokes the golang function to run an external program. The golang function does a clone(), a whole lot of other stuff after it, and in the end the execve(). The code is here: https://golang.org/src/syscall/exec_linux.go Look at the function forkAndExecInChild on line 56++. The problem with that is that the execve() will trigger IMA measurements. IMA will refuse being hooked up with a vTPM driver if it couldn't put its first measurement(s) into a vTPM. I don't think we should queue PCR extensions until a device may eventually become available. So, the conclusion is, to accomodate golang (for example) we can create the device pair, sit the vTPM on top of the master, and reserve the device pair befor the next clone() so that IMA finds it and can hook up to it. What is wrong with this scheme? The ioctl for 'reservation' before the clone()? > > > > > Then just return the fd like I said. > > > > Any driver that can be used as an example ? > > The stuff in include/linux/anon_inodes.h is used to do this, eg > a sequence of anon_inode_getfile, get_unused_fd_flags, fd_install > will create a file descriptor in the calling process associated with a > struct file_operations > So let me show how things work currently. fd = open("/dev/vtpmx", ...); ioctl(fd, 'create device pair; TPM will be a TPM 2 version device') /* conveying that the device is a TPM 2 or 1.2 is important so that the kernel sends the right commands to the device and possibly also for sysfs handling */ /* at this point /dev/vtpmc0 and /dev/vtpms0 have been created */ clientfd = open("/dev/vtpmc0",...) serverfd = open("/dev/vtpms0",...); ... close(serverfd); /* /dev/vtpmc0 and /dev/vtpms0 disappeared */ Should it work like this? serverfd = open("/dev/vtpmx", ...); /* at this point /dev/vtpmc0 has been created and serverfd is used for the server side */ ioctl(fd, 'TPM will be a TPM 2 version device') /* setting flags on the chip->flags 'late' should be possible */ ... close(serverfd); /* /dev/vtpmc0 disappeared */ Regards, Stefan > > > auto-delete a master char dev on close is a very strange API, don't do > > > that. > > > > What I called cleanup can be trigger by the vTPM closing /dev/ > vtpms%d, so the > > server-side. What is the master for you? /dev/vtpmx where we run > the ioctls on? > > Sorry, master/slave is the ptx nomenclature for this scheme. Master > would be your server side I think. > > Jason > ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Thu, Jan 21, 2016 at 02:02:17PM -0500, Stefan Berger wrote: > What is IMA namespace in relation to a device's name? The method is to > read the major/minor numbers on the host and created /dev/tpm0 with the > same major/minor numbers in the container's filesystem. The name > doesn't matter I guess, but major/minor are important. Ostensibly we number the /dev/tpmX's in relation to the tpm index number. Internally to the kernel the TPM access is done by that tpm index. Today, IMA hard codes that index value to 0 (IIRC). I could see a future IMA allowing user space to specify the index. The index is also how to associate the /dev/tpm node with the /sysfs files. So the index is important, we'd want to control it for namespaces. In any case the tpm index is part of the contract, and it would be ideal if the IMA namespace made tpm index 0 be the right vtpm. > The problem I have run into in particular with Docker and golang is > that Docker invokes the golang function to run an external program. The > golang function does a clone(), a whole lot of other stuff after it, > and in the end the execve(). Well, ultimately that is a docker problem, as you describe IMA has a special new requirement where the IMA NS has to be setup quickly. > The code is here: > [1]https://golang.org/src/syscall/exec_linux.go > Look at the function forkAndExecInChildon line 56++. Well, that is just bad API design, sorry. The unix model of fork()/exec() is that the app gets a chance to adjust the environment between fork/exec, and this design, while easy to use, locks the app into the hard wired customization that forkAndExecInChild does. Maybe add a callback to SysProcAttr or something? Can't help you here. You can't let this influence the kernel UAPI design. > available. So, the conclusion is, to accomodate golang (for example) we > can create the device pair, sit the vTPM on top of the master, and > reserve the device pair befor the next clone() so that IMA finds it and > can hook up to it. > What is wrong with this scheme? The ioctl for 'reservation' before the > clone()? Yes, how does that sort of thing even make sense in a complex multi-threaded world? > Should it work like this? Sort of like this: controlfd = open("/dev/vtpmx", ...); ioctl(controlfd, CREATE_VTPM, &inargs, &outargs); serverfd = outargs.fd; // /dev/tpmX exists. X is returned in outargs.tpm_index, maybe return major/minor too child = clone(...) ioctl(??? , ASSIGN_VTPM_TO_NS, .. child->ima_ns .., to index = 0, from index = outargs.tpm_index); /* tpm index X is destroyed, kernel prevents reuse of index X until the NS is destroyed too. /dev/tpmX is removed by udev */ close(severfd); So, you'd probably make a vtpm daemon that took as execv args a reference to the IMA namespace to create the tpm in, and have docker launch it after the clone, but before the exec in the parent namespace. That is fairly similar to how net ns works, with the wrinkle you have to do this before the exec, I guess. It also allows hw tpms to be routed to the ns. The docker container just has the normal /dev/tpm0 Jason ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 01/21/2016 02:30:49 PM: > > On Thu, Jan 21, 2016 at 02:02:17PM -0500, Stefan Berger wrote: > > What is IMA namespace in relation to a device's name? The method is to > > read the major/minor numbers on the host and created /dev/tpm0 with the > > same major/minor numbers in the container's filesystem. The name > > doesn't matter I guess, but major/minor are important. > > Ostensibly we number the /dev/tpmX's in relation to the tpm index > number. > > Internally to the kernel the TPM access is done by that tpm index. > > Today, IMA hard codes that index value to 0 (IIRC). I could see a > future IMA allowing user space to specify the index. The index is also > how to associate the /dev/tpm node with the /sysfs files. > > So the index is important, we'd want to control it for namespaces. > > In any case the tpm index is part of the contract, and it would be > ideal if the IMA namespace made tpm index 0 be the right vtpm. Got it. > > > The problem I have run into in particular with Docker and golang is > > that Docker invokes the golang function to run an external program. The > > golang function does a clone(), a whole lot of other stuff after it, > > and in the end the execve(). > > Well, ultimately that is a docker problem, as you describe IMA has a > special new requirement where the IMA NS has to be setup quickly. > > > The code is here: > > [1]https://golang.org/src/syscall/exec_linux.go > > Look at the function forkAndExecInChildon line 56++. > > Well, that is just bad API design, sorry. The unix model of fork()/exec() > is that the app gets a chance to adjust the environment between > fork/exec, and this design, while easy to use, locks the app into the > hard wired customization that forkAndExecInChild does. >From a callers' perspective it's attractive to have it do all kinds of stuff IF one doesn't have to go in between. Unfortunately we have to go in between. > > Maybe add a callback to SysProcAttr or something? Can't help you here. > > You can't let this influence the kernel UAPI design. The choice is between getting this working 'today' (even if just locally) or discussing this with golang designers, which in the ideal case would cause me waiting for the next version, dealing with that version dependency etc., plus the delay. So, clearly, an additional ioctl() and ~50 lines of code make this work 'now'. Doesn't this seem worth it? > > > available. So, the conclusion is, to accomodate golang (for example) we > > can create the device pair, sit the vTPM on top of the master, and > > reserve the device pair befor the next clone() so that IMA finds it and > > can hook up to it. > > What is wrong with this scheme? The ioctl for 'reservation' before the > > clone()? > > Yes, how does that sort of thing even make sense in a complex > multi-threaded world? controlfd = open("/dev/vtpmx, ..."); ioctl(controlfd, CREATE_VTPM, &inargs, &outargs); [...] ioctl(controlfd, RESERVE_VTPM_FOR_NS, ...params); clone() The possibility of passing that controlfd between threads is there if clone() was to happen in another thread ... (it currently doesn't). It shouldn't be a problem. > > > Should it work like this? > > Sort of like this: > > controlfd = open("/dev/vtpmx", ...); > ioctl(controlfd, CREATE_VTPM, &inargs, &outargs); > serverfd = outargs.fd; > // /dev/tpmX exists. X is returned in outargs.tpm_index, maybe > return major/minor too > > child = clone(...) > ioctl(??? , ASSIGN_VTPM_TO_NS, .. child->ima_ns .., to index = 0, > from index = outargs.tpm_index); after the clone() you are in that IMA namespace. So the only argument needed there is the index to the tpm_chip to hook up to the current IMA namespace. > > /* tpm index X is destroyed, kernel prevents reuse of index X > until the NS is destroyed too. /dev/tpmX is removed by udev */ > close(severfd); > > So, you'd probably make a vtpm daemon that took as execv args a > reference to the IMA namespace to create the tpm in, and have docker > launch it after the clone, but before the exec in the parent > namespace. I got that part with the fd, major & minor number. It seems to work. I have one ioctl to reserve for before the clone and another ioctl to hook IMA-NS and vTPM together after the clone, but that patch is for later. So let's not just kill the ioctl for 'reservation' like that, please. > > That is fairly similar to how net ns works, with the wrinkle you have > to do this before the exec, I guess. > > It also allows hw tpms to be routed to the ns. How many hardware TPMs are going to be there ? One? That is to be used for the host, right? Presumably the containers will all get emulated TPMs. And sharing the single hardware TPM between multiple containers just isn't possible. This will not be possible when going through the vTPM driver, but you have the ??? up there. I'd put the 'controlfd' in that place. The vTPM driver will only know about vtpm_dev->chip that it created and none of them is a hardware TPM. Stefan > > The docker container just has the normal /dev/tpm0 > > Jason > ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Thu, Jan 21, 2016 at 04:51:06PM -0500, Stefan Berger wrote: > > You can't let this influence the kernel UAPI design. > The choice is between getting this working 'today' (even if just > locally) or discussing this with golang designers, which in the ideal > case would cause me waiting for the next version, dealing with that > version dependency etc., plus the delay. So, clearly, an additional > ioctl() and ~50 lines of code make this work 'now'. Doesn't this seem > worth it? Sorry, for mainline stuff like this reserve thing is not clean enough to be acceptable. Why can't you just open-code a modified forkAndExecInChild in docker? Seriously, the golang code you showed already has special stuff to do user namespaces before the exec, it is totally unreasonable insist that other namespace stuff can't be done the same way. > > child = clone(...) > > ioctl(??? , ASSIGN_VTPM_TO_NS, .. child->ima_ns .., to index = 0, > > from index = outargs.tpm_index); > after the clone() you are in that IMA namespace. So the only argument > needed there is the index to the tpm_chip to hook up to the current IMA > namespace. No, all of the above was in the parent namespace, the clone creates the child IMA namespace and launches the thread, but only the parent would have enough permissions to actually share the TPM to the child - ie the child cannot self-join. You could also go the way of the mount namespace where the actions isn't an 'add to child namespace' but a 'remove from my namespace', but I don't think that makes as much sense for devices, the 'add' approach in line with the net ns seems cleaner. Others more familiar with namespaces may have other ideas, but I doubt you'd find anyone to support a reserve scheme. > I got that part with the fd, major & minor number. It seems to work. > I have one ioctl to reserve for before the clone and another ioctl to > hook IMA-NS and vTPM together after the clone, but that patch is for > later. So let's not just kill the ioctl for 'reservation' like that, > please. No, kill it now. It doesn't make sense as part of this series, it should have been in the IMA namespace patch anyhow. > > That is fairly similar to how net ns works, with the wrinkle you have > > to do this before the exec, I guess. > > > > It also allows hw tpms to be routed to the ns. > How many hardware TPMs are going to be there ? One? That is to be used > for the host, right? No idea. It depends on the application. The HW tpm is alot better than this vtpm idea for many use cases. I could see applications where you'd want to use the host PCRS because they cover the bios and kernel the app is actually running under. I actually have a hard time understanding what value fake software PCRS are to containers <shrug> I certainly wouldn't want to be forced to store keys in a software tpm just because I am using containers, for instance. > TPMs. And sharing the single hardware TPM between multiple containers > just isn't possible. Of course it is, it just hasn't been done yet, and won't be a 100% perfect emulation. > This will not be possible when going through the vTPM driver, but you > have the ??? up there. I'd put the 'controlfd' in that place. No, it should not be controlfd, it should be what ever API the rest of the IMA namespace stuff is using, I think. > The vTPM driver will only know about vtpm_dev->chip that it > created and none of them is a hardware TPM. Right, controlfd implies that only vtpms could be shared to a IMA namespace, which is a terrible API. This is another reason why reserved is a terrible API. Jason ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Thu, Jan 21, 2016 at 03:10:40PM -0700, Jason Gunthorpe wrote: > > The vTPM driver will only know about vtpm_dev->chip that it > > created and none of them is a hardware TPM. > > Right, controlfd implies that only vtpms could be shared to a IMA > namespace, which is a terrible API. This is another reason why > reserved is a terrible API. It's funny because for me the API looks robust, configurable and nice to use [1]. I could use it outside of virtualization space (I already mentioned two applications). Your criticism is for me way too abstract. I have hard time following it. This is a nice incremental step. By no means it's perfect but something that would satisfy a lot of users for things that they want do now (like me). [1] I've been still too busy to try this out in order to say the final word. > Jason /Jarkko ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On 01/21/2016 05:10 PM, Jason Gunthorpe wrote: > On Thu, Jan 21, 2016 at 04:51:06PM -0500, Stefan Berger wrote: > >> > You can't let this influence the kernel UAPI design. >> The choice is between getting this working 'today' (even if just >> locally) or discussing this with golang designers, which in the ideal >> case would cause me waiting for the next version, dealing with that >> version dependency etc., plus the delay. So, clearly, an additional >> ioctl() and ~50 lines of code make this work 'now'. Doesn't this seem >> worth it? > Sorry, for mainline stuff like this reserve thing is not clean enough > to be acceptable. > > Why can't you just open-code a modified forkAndExecInChild in docker? > > Seriously, the golang code you showed already has special stuff to do > user namespaces before the exec, it is totally unreasonable insist > that other namespace stuff can't be done the same way. I have not 'insisted' that this cannot be done. I have merely shown a work-around. It looks like the golan impl. would need some sort of (optional) synchronization between the child and the parent in order for the parent to hook the vTPM to the IMA namespace that the child just created if the parent is the only one allowed to do that. The hook-up would have to happen before the execve()... > >> > child = clone(...) >> > ioctl(??? , ASSIGN_VTPM_TO_NS, .. child->ima_ns .., to index = 0, >> > from index = outargs.tpm_index); >> after the clone() you are in that IMA namespace. So the only argument >> needed there is the index to the tpm_chip to hook up to the current IMA >> namespace. > No, all of the above was in the parent namespace, the clone creates > the child IMA namespace and launches the thread, but only the parent > would have enough permissions to actually share the TPM to the child - > ie the child cannot self-join. Ok, so I guess the ioctl would take the PID of the created child as a parameter (how else to identify it?). The kernel would find that child, check that the calling parent is indeed the parent of the child, and if everything looks good do the hook-up. The reservation works similar in so far as the 'reservation' would lookup the calling processe's PID and associate that with the vTPM device pair instance on which the ioctl was called. Once the child is created, IMA namspacing looks for a reserved vTPM device pair instance by matching up PIDs. > > You could also go the way of the mount namespace where the actions > isn't an 'add to child namespace' but a 'remove from my namespace', > but I don't think that makes as much sense for devices, the 'add' > approach in line with the net ns seems cleaner. Others more familiar > with namespaces may have other ideas, but I doubt you'd find anyone to > support a reserve scheme. > >> I got that part with the fd, major & minor number. It seems to work. >> I have one ioctl to reserve for before the clone and another ioctl to >> hook IMA-NS and vTPM together after the clone, but that patch is for >> later. So let's not just kill the ioctl for 'reservation' like that, >> please. > No, kill it now. It doesn't make sense as part of this series, it > should have been in the IMA namespace patch anyhow. > >> > That is fairly similar to how net ns works, with the wrinkle you have >> > to do this before the exec, I guess. >> > >> > It also allows hw tpms to be routed to the ns. >> How many hardware TPMs are going to be there ? One? That is to be used >> for the host, right? > No idea. It depends on the application. The HW tpm is alot better than > this vtpm idea for many use cases. I could see applications where > you'd want to use the host PCRS because they cover the bios and kernel > the app is actually running under. I actually have a hard time > understanding what value fake software PCRS are to containers <shrug> I would just say that both use-cases exist, one with a hardware TPM and one with each container having a vTPM. The issue is that I don't see how well containers can share the single hardware TPM and its PCRs. > > I certainly wouldn't want to be forced to store keys in a software tpm > just because I am using containers, for instance. IMO, sharing the single hardware TPM between containers is an unsolved problem. > >> TPMs. And sharing the single hardware TPM between multiple containers >> just isn't possible. > Of course it is, it just hasn't been done yet, and won't be a 100% > perfect emulation. How do you handle access to PCRs? How do you handle sealing against PCRs? > >> This will not be possible when going through the vTPM driver, but you >> have the ??? up there. I'd put the 'controlfd' in that place. > No, it should not be controlfd, it should be what ever API the rest of > the IMA namespace stuff is using, I think. > >> The vTPM driver will only know about vtpm_dev->chip that it >> created and none of them is a hardware TPM. > Right, controlfd implies that only vtpms could be shared to a IMA > namespace, which is a terrible API. This is another reason why > reserved is a terrible API. Thanks for all feedback. I will eventually post v2. For now it's to be found here: https://github.com/stefanberger/linux The branch is easy to find. Stefan ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Fri, Jan 22, 2016 at 10:09:16AM -0500, Stefan Berger wrote: > Ok, so I guess the ioctl would take the PID of the created child as a > parameter (how else to identify it?). The kernel would find that child, > check that the calling parent is indeed the parent of the child, and if > everything looks good do the hook-up. That is one choice, it is worth reviewing what, eg, the net folks did for their API. They have already thought about this alot. > >Right, controlfd implies that only vtpms could be shared to a IMA > >namespace, which is a terrible API. This is another reason why > >reserved is a terrible API. > > Thanks for all feedback. I will eventually post v2. For now it's to be found > here: > > https://github.com/stefanberger/linux > > The branch is easy to find. That looks much nicer, yes. I'd also merge the tpm-vtpm.h into the .c file However, I'm still not sure this is right: + vtpm_dev->chip = tpmm_chip_alloc(&vtpm_dev->dev, &vtpm_tpm_ops); The only issue is error unwind, the tpmm version assumes devm works for that, but the vtpm_dev will not be destroyed if the tpm attach fails, do devm is not appropriate. As I said before the tpmm stuff was a hack I did to make it easier to migrate the large number of drivers, core code should not be relying on devm like that... One good option here would be unwind that a bit and create a tpm_chip_alloc/tpm_chip_register flow that did not use devm at all. That could be fairly straight forward.. Also, what is up with the _vtpm prefix on some functions? That doesn't seem aligned with the kernel style.. And confused why there is a miscdev and a alloc_chrdev_region ? Jason ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 01/25/2016 01:10:46 PM: > > On Fri, Jan 22, 2016 at 10:09:16AM -0500, Stefan Berger wrote: > > > > Thanks for all feedback. I will eventually post v2. For now it's to be found > > here: > > > > https://github.com/stefanberger/linux > > > > The branch is easy to find. > > That looks much nicer, yes. > > I'd also merge the tpm-vtpm.h into the .c file Ok, I can do that. > > However, I'm still not sure this is right: > > + vtpm_dev->chip = tpmm_chip_alloc(&vtpm_dev->dev, &vtpm_tpm_ops); > > The only issue is error unwind, the tpmm version assumes devm works > for that, but the vtpm_dev will not be destroyed if the tpm attach > fails, do devm is not appropriate. > > As I said before the tpmm stuff was a hack I did to make it easier to > migrate the large number of drivers, core code should not be relying > on devm like that... One good option here would be unwind that a bit > and create a tpm_chip_alloc/tpm_chip_register flow that did not use > devm at all. That could be fairly straight forward.. > > Also, what is up with the _vtpm prefix on some functions? That doesn't > seem aligned with the kernel style.. Renamed them. > > And confused why there is a miscdev and a alloc_chrdev_region ? miscdev = /dev/vtpmx - that should be ok, no ? So, I just removed alloc_chrdev_region and with that the assignment of a major+minor to the virtual device. This works fine on device creation but on device destruction something odd happens in sysfs. With two 'vtpmctrl' test programs running sysfs looks like this: # find /sys/devices/virtual/vtpm/ /sys/devices/virtual/vtpm/ /sys/devices/virtual/vtpm/vtpms0 /sys/devices/virtual/vtpm/vtpms0/dev /sys/devices/virtual/vtpm/vtpms0/durations ... /sys/devices/virtual/vtpm/vtpms0/tpm0 /sys/devices/virtual/vtpm/vtpms0/tpm0/dev ... /sys/devices/virtual/vtpm/vtpms1 /sys/devices/virtual/vtpm/vtpms1/dev /sys/devices/virtual/vtpm/vtpms1/durations ... /sys/devices/virtual/vtpm/vtpms1/tpm1 /sys/devices/virtual/vtpm/vtpms1/tpm1/dev ... /sys/devices/virtual/vtpm/vtpms1/tpm1/power /sys/devices/virtual/vtpm/vtpms1/tpm1/power/control Now one of the test process 'vtpmctrl' terminates: /sys/devices/virtual/vtpm/ /sys/devices/virtual/vtpm/vtpms1 /sys/devices/virtual/vtpm/vtpms1/power /sys/devices/virtual/vtpm/vtpms1/power/control /sys/devices/virtual/vtpm/vtpms1/power/async /sys/devices/virtual/vtpm/vtpms1/power/runtime_enabled /sys/devices/virtual/vtpm/vtpms1/power/runtime_active_kids /sys/devices/virtual/vtpm/vtpms1/power/runtime_active_time /sys/devices/virtual/vtpm/vtpms1/power/autosuspend_delay_ms /sys/devices/virtual/vtpm/vtpms1/power/runtime_status /sys/devices/virtual/vtpm/vtpms1/power/runtime_usage /sys/devices/virtual/vtpm/vtpms1/power/runtime_suspended_time /sys/devices/virtual/vtpm/vtpms1/subsystem /sys/devices/virtual/vtpm/vtpms1/uevent That's all that is left now. .../vtpms1/tpm1/ has also already disappeared. Needless to say, the kernel is very unhappy once the other vtpmctrl terminates. The virtual device needs a major/minor as well. Are there any other possible solutions? Stefan ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Hi Apologies if I'm asking things that have popped out already in this discussion (and also being passive so far!). I'm still in the process of reading the discussion about [3/4] of the patch set that Stefan sent. I should have done that and read the source code properly before sending any responses. Sorry for being unforgivable sloppy with this. > > Thanks for all feedback. I will eventually post v2. For now it's to be found > > here: > > > > https://github.com/stefanberger/linux > > > > The branch is easy to find. > > That looks much nicer, yes. > > I'd also merge the tpm-vtpm.h into the .c file > > However, I'm still not sure this is right: > > + vtpm_dev->chip = tpmm_chip_alloc(&vtpm_dev->dev, &vtpm_tpm_ops); > > The only issue is error unwind, the tpmm version assumes devm works > for that, but the vtpm_dev will not be destroyed if the tpm attach > fails, do devm is not appropriate. At the moment tpmm_chip_alloc() does not use devm for anything. Are you speaking about stuff that happens between alloc and register as some of the drivers use devm to associate resources to pdev at that point? What do you mean by "tpm attach" here? Are you talking about the ioctl for the vtpmx device that creates the device pair? Doesn't the fallback path call destroy_device()? I'm not sure which devm associated resources you are talking about. > As I said before the tpmm stuff was a hack I did to make it easier to > migrate the large number of drivers, core code should not be relying > on devm like that... One good option here would be unwind that a bit > and create a tpm_chip_alloc/tpm_chip_register flow that did not use > devm at all. That could be fairly straight forward.. AFAIK I implemented two phase initialization or are you talking about something else? > Also, what is up with the _vtpm prefix on some functions? That doesn't > seem aligned with the kernel style.. > > And confused why there is a miscdev and a alloc_chrdev_region ? > > Jason /Jarkko ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Mon, Jan 25, 2016 at 05:46:52PM -0800, Jarkko Sakkinen wrote: > > The only issue is error unwind, the tpmm version assumes devm works > > for that, but the vtpm_dev will not be destroyed if the tpm attach > > fails, do devm is not appropriate. > > At the moment tpmm_chip_alloc() does not use devm for anything. Are you > speaking about stuff that happens between alloc and register as some of > the drivers use devm to associate resources to pdev at that point? Ugh, well I missed that when you made those patches. You need to put the devm back ASAP, as it is all the drivers now have broken error handling. eg static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info, acpi_handle acpi_dev_handle) [..] chip = tpmm_chip_alloc(dev, &tpm_tis); [..] if (IS_ERR(chip)) return PTR_ERR(chip); Obviously leaks chip. The function is called tpmm_ specifically because it is supposed to use devm resource unwind, it was a big mistake to remove the devm and not rename the function or update the comments. With the change to use a struct device the devm action should have been a device_put to pair with the device_initialize. > > As I said before the tpmm stuff was a hack I did to make it easier to > > migrate the large number of drivers, core code should not be relying > > on devm like that... One good option here would be unwind that a bit > > and create a tpm_chip_alloc/tpm_chip_register flow that did not use > > devm at all. That could be fairly straight forward.. > > AFAIK I implemented two phase initialization or are you talking about > something else? Right, I was confused because I wrote a different devm thing for TPM that I discarded when I did the first rework. Jason ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Mon, Jan 25, 2016 at 08:05:14PM -0500, Stefan Berger wrote: > > And confused why there is a miscdev and a alloc_chrdev_region ? > > miscdev = /dev/vtpmx - that should be ok, no ? Yes > So, I just removed alloc_chrdev_region and with that the assignment of a > major+minor to the virtual device. This works fine on device creation but on > device destruction something odd happens in sysfs. > With two 'vtpmctrl' test programs running sysfs looks like this: > > # find /sys/devices/virtual/vtpm/ > /sys/devices/virtual/vtpm/ > /sys/devices/virtual/vtpm/vtpms0 > /sys/devices/virtual/vtpm/vtpms0/dev Ugh. So, in an ideal world vtpms0 wouldn't even exist. The need for it is more tpm core breakage. You could avoid this by having the tpm core not create sysfs files on the parent for the vtpms. I think we are very close to being able to do that. The it would just be /sys/devices/virtual/tpm0/dev If vtpms0 is kept, then it shouldn't have a 'dev': > /sys/devices/virtual/vtpm/vtpms0/dev The plus side is that it should be able to act as the devm container for the tpmm_ function. This should be the only dev: > /sys/devices/virtual/vtpm/vtpms0/tpm0/dev That is a big clue something is very wrong if the above exists but the alloc_chrdev_region was removed. > That's all that is left now. .../vtpms1/tpm1/ has also already disappeared. > Needless to say, the kernel is very unhappy once the other vtpmctrl terminates. > > The virtual device needs a major/minor as well. Are there any other possible > solutions? This bad behavior is some secondary bug.. Just looking around I think all the lifetime stuff needs a good go over. I noticed at least... This looks wrong: +static int vtpm_fops_release(struct inode *inode, struct file *filp) + put_device(&vtpm_dev->dev); + + _vtpm_delete_device_pair(vtpm_dev); put_device should be last because it can kfree(vtpm_dev) This looks questionable too: + device_destroy(vtpm_class, vtpm_dev->dev.devt); The driver never calls device_create, it shouldn't call device_destroy. The device_initialize transfers kref ownership to the caller, so vtpm_dev->dev needs a matching device_put on all error paths, not device_destroy IMHO, the put_device in the fops->release should be the match to device_initialize, and it should be the last thing. Audit eveything else to make that true This looks wrong too: + /* device is needed for core TPM driver */ + device_initialize(&vtpm_dev->dev); + vtpm_dev->dev.devt = devt; + vtpm_dev->dev.class = vtpm_class; + vtpm_dev->dev.release = vtpm_dev_release; I usualle like to put device_initialize last in that sequence, drop the devt stuff so you don't get a 'dev' The RCU stuff looks off too - where is the synchronize_rcu ? Jason ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Mon, Jan 25, 2016 at 08:19:19PM -0700, Jason Gunthorpe wrote: > On Mon, Jan 25, 2016 at 05:46:52PM -0800, Jarkko Sakkinen wrote: > > > The only issue is error unwind, the tpmm version assumes devm works > > > for that, but the vtpm_dev will not be destroyed if the tpm attach > > > fails, do devm is not appropriate. > > > > At the moment tpmm_chip_alloc() does not use devm for anything. Are you > > speaking about stuff that happens between alloc and register as some of > > the drivers use devm to associate resources to pdev at that point? > > Ugh, well I missed that when you made those patches. > > You need to put the devm back ASAP, as it is all the drivers now have > broken error handling. eg > > static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info, > acpi_handle acpi_dev_handle) > [..] > chip = tpmm_chip_alloc(dev, &tpm_tis); > [..] > if (IS_ERR(chip)) > return PTR_ERR(chip); > > Obviously leaks chip. Could you describe how it leaks a chip and what do you mean by verb "leak"? > The function is called tpmm_ specifically because it is supposed to > use devm resource unwind, it was a big mistake to remove the devm and > not rename the function or update the comments. The function does not reserve any resources using devm. Should it? > With the change to use a struct device the devm action should have > been a device_put to pair with the device_initialize. Well, I found a regression that is luckily a rare one. It does not happen when tpm_chip_register() is called because then it will be paired with tpm_chip_unregister(), which calls device_unregister(). I injected error to tpm_crb initialization in-between alloc and register and release-callback was never called. When I did insmod with a working driver it got device name /dev/tpm1. So now I do have regression that does show up. If you don't have it, it does not make sense to "fix" the code even if it looks obviously wrong. I have to ask one question. In earlier response to you stated that one should get rid of devm? How'd you fix this problem without devm and why you want to get rid of it in this scenario? Thanks. /Jarkko ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 01/25/2016 10:46:32 PM: > > On Mon, Jan 25, 2016 at 08:05:14PM -0500, Stefan Berger wrote: > > > > And confused why there is a miscdev and a alloc_chrdev_region ? > > > > miscdev = /dev/vtpmx - that should be ok, no ? > > Yes > > > So, I just removed alloc_chrdev_region and with that the assignment of a > > major+minor to the virtual device. This works fine on device creation but on > > device destruction something odd happens in sysfs. > > > With two 'vtpmctrl' test programs running sysfs looks like this: > > > > # find /sys/devices/virtual/vtpm/ > > /sys/devices/virtual/vtpm/ > > /sys/devices/virtual/vtpm/vtpms0 > > /sys/devices/virtual/vtpm/vtpms0/dev > > Ugh. > > So, in an ideal world vtpms0 wouldn't even exist. The need for it is > more tpm core breakage. You could avoid this by having the tpm core > not create sysfs files on the parent for the vtpms. I think we are > very close to being able to do that. We're not explicitly registering the parent with sysfs, yet it appears there. How do you prevent it from appearing there? The only other solution I could think of is being able to pass a NULL for 'dev' into the existing tpmm_chip_alloc() and avoid the parent device altogether. > > The it would just be /sys/devices/virtual/tpm0/dev > > If vtpms0 is kept, then it shouldn't have a 'dev': Now it doesn't have it anymore. > > > /sys/devices/virtual/vtpm/vtpms0/dev > > The plus side is that it should be able to act as the devm container > for the tpmm_ function. > > This should be the only dev: > > > /sys/devices/virtual/vtpm/vtpms0/tpm0/dev > > That is a big clue something is very wrong if the above exists but > the alloc_chrdev_region was removed. > > > That's all that is left now. .../vtpms1/tpm1/ has also already disappeared. > > Needless to say, the kernel is very unhappy once the other > vtpmctrl terminates. > > > > The virtual device needs a major/minor as well. Are there any other possible > > solutions? > > This bad behavior is some secondary bug.. Just looking around I think > all the lifetime stuff needs a good go over. I noticed at least... > > This looks wrong: > > +static int vtpm_fops_release(struct inode *inode, struct file *filp) > > + put_device(&vtpm_dev->dev); > + > + _vtpm_delete_device_pair(vtpm_dev); > > put_device should be last because it can kfree(vtpm_dev) Yes. Fixed. > > This looks questionable too: > > + device_destroy(vtpm_class, vtpm_dev->dev.devt); > > The driver never calls device_create, it shouldn't call > device_destroy. My bad! This now has to be device_del() + put_device() and then it works. > > The device_initialize transfers kref ownership to the caller, so > vtpm_dev->dev needs a matching device_put on all error paths, not > device_destroy > > IMHO, the put_device in the fops->release should be the match to > device_initialize, and it should be the last thing. Audit eveything > else to make that true > > This looks wrong too: > > + /* device is needed for core TPM driver */ > + device_initialize(&vtpm_dev->dev); > + vtpm_dev->dev.devt = devt; > + vtpm_dev->dev.class = vtpm_class; > + vtpm_dev->dev.release = vtpm_dev_release; > > I usualle like to put device_initialize last in that sequence, drop > the devt stuff so you don't get a 'dev' I did that in the same order as what 'device_create' ends up doing : 1697 device_initialize(dev); 1698 dev->devt = devt; 1699 dev->class = class; 1700 dev->parent = parent; 1701 dev->groups = groups; 1702 dev->release = device_create_release; 1703 dev_set_drvdata(dev, drvdata); http://lxr.free-electrons.com/source/drivers/base/core.c#L1680 Usage of 'devt' is gone. I put the fixed version of the driver here: https://github.com/stefanberger/linux/commits/vtpm-driver.v2 Thanks for the review. Stefan > > The RCU stuff looks off too - where is the synchronize_rcu ? > > Jason > ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Tue, Jan 26, 2016 at 05:56:59AM -0800, Jarkko Sakkinen wrote: > On Mon, Jan 25, 2016 at 08:19:19PM -0700, Jason Gunthorpe wrote: > > On Mon, Jan 25, 2016 at 05:46:52PM -0800, Jarkko Sakkinen wrote: > > > > The only issue is error unwind, the tpmm version assumes devm works > > > > for that, but the vtpm_dev will not be destroyed if the tpm attach > > > > fails, do devm is not appropriate. > > > > > > At the moment tpmm_chip_alloc() does not use devm for anything. Are you > > > speaking about stuff that happens between alloc and register as some of > > > the drivers use devm to associate resources to pdev at that point? > > > > Ugh, well I missed that when you made those patches. > > > > You need to put the devm back ASAP, as it is all the drivers now have > > broken error handling. eg > > > > static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info, > > acpi_handle acpi_dev_handle) > > [..] > > chip = tpmm_chip_alloc(dev, &tpm_tis); > > [..] > > if (IS_ERR(chip)) > > return PTR_ERR(chip); > > > > Obviously leaks chip. > > Could you describe how it leaks a chip and what do you mean by verb > "leak"? It looks like you alread figured this out. release is never called, so the kalloc for chip is never freed. > > The function is called tpmm_ specifically because it is supposed to > > use devm resource unwind, it was a big mistake to remove the devm and > > not rename the function or update the comments. > > The function does not reserve any resources using devm. Should it? It originally did, but commit 313d21eeab9282e01fdcecd40e9ca87e0953627f took it out and didn't change the function name. The original version looked like: struct tpm_chip *tpmm_chip_alloc(struct device *dev, const struct tpm_class_ops *ops) { [..] devm_add_action(dev, tpmm_chip_remove, chip); dev_set_drvdata(dev, chip); return chip; } > > With the change to use a struct device the devm action should have > > been a device_put to pair with the device_initialize. > > Well, I found a regression that is luckily a rare one. It isn't rare, any error path will cause this. > So now I do have regression that does show up. If you don't have it, > it does not make sense to "fix" the code even if it looks obviously > wrong. That isn't the kernel way, this is very obviosly a bug, I don't need to see 'proof' to know it needs fixing. > I have to ask one question. In earlier response to you stated that one > should get rid of devm? How'd you fix this problem without devm and why > you want to get rid of it in this scenario? To get rid of devm each driver would have to be updated to have the right goto-error unwind to device_put the chip. It is alot of tricky work to do this, which is why I suggested the tpmm_ scheme when you did the original two step alloc patches. For drivers like Stefan's non-devm might be needed, I'm still not clear on that to be honest. There are rules when devm can be used and not, and using it outside a probe() function or similar is something I'm not sure works.. Jason ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Tue, Jan 26, 2016 at 09:21:42AM -0500, Stefan Berger wrote: > > So, in an ideal world vtpms0 wouldn't even exist. The need for it is > > more tpm core breakage. You could avoid this by having the tpm core > > not create sysfs files on the parent for the vtpms. I think we are > > very close to being able to do that. > We're not explicitly registering the parent with sysfs, yet it appears > there. How do you prevent it from appearing there? static struct kobject *get_device_parent(struct device *dev, struct device *parent) /* * If we have no parent, we live in "virtual". * Class-devices with a non class-device as parent, live * in a "glue" directory to prevent namespace collisions. */ The vtpm should be under virtual, the question is if we need vtpm and and tpm devices, ie should the vtpm driver be calling class_create or not. I don't have an answer to that, are there other virtualization devices you can reference as an example? Ideally you'd get someone familiar with sysfs to confirm these details. > The only other solution I could think of is being able to pass a > NULL for 'dev' into the existing tpmm_chip_alloc() and avoid the > parent device altogether. This would drop the vtpm and put tpm directly under virtual, however I think the tpm core currently will segfault if parent is null, that could be fixed if that is the way this should go. >> put_device should be last because it can kfree(vtpm_dev) > Yes. Fixed. Not on your github... +static int vtpm_fops_release(struct inode *inode, struct file *filp) +{ + struct vtpm_dev *vtpm_dev = filp->private_data; + + filp->private_data = NULL; + + put_device(&vtpm_dev->dev); + + vtpm_delete_device_pair(vtpm_dev); + + return 0; +} Don't put something then continue to reference it after. Also, I would probably drop the get_device in vtpm_fops_open, use the implicit get_device of device_initialize instead. Then drop the put_device in vtpm_destroy_vtpm_dev and use only one at the end of the fops release. So the flow is, device_initialize [...] create a flip, transfer the kref from device_initialize to the flip [..] Which makes the error flow very sane, once the flip is created, deleting the flip will clean up everything. > I put the fixed version of the driver here: > https://github.com/stefanberger/linux/commits/vtpm-driver.v2 Please go through all the error handling, it still looks wrong: +static long vtpmx_fops_ioctl(struct file *f, unsigned int ioctl, + unsigned long arg) + vtpm_dev = vtpm_create_device_pair(&vtpm_new_pair); [..] + if (copy_to_user(vtpm_new_pair_p, &vtpm_new_pair, + sizeof(vtpm_new_pair))) + return -EFAULT; Can't be right, that fails the ioctl but still creates the FD and TPM. The goto unwind of vtpm_create_device_pair looks wrong too, double calls vtpm_destroy_vtpm_dev and other. See above on how the flip release should be relied upon to do that once the flip is created. Also I'm wondering about the placement of tpm_chip_register, long term we are going to be moving tpm setup commands under tpm_chip_register, so the driver must be fully ready to handle commands when it is called, so I think vtpm calls it too early. For instance, get_timeouts is never called in vtpm, which is not correct. The soft tpm should have an opportunity to tell the kernel the timeout values. I think you have to fix this by calling tpm_get_timeouts/chip_register from a work queue thread created before the ioctl returns. That should avoid deadlock.. This is also very wrong: +struct vtpm_dev { + struct kref kref; + + struct device dev; Structs should only ever have one kref. Get rid of 'kref' and use the kref embedded in 'dev' via get_device/put_device. Also, what is the point of 'vtpm_dev_get_by_chip' ? Just store the vtpm_dev in the chip's private data pointer like every other driver. No special locking is needed, tpm core guarentees no op is running or will ever be called again once tpm_unregister returns. Jason ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 01/26/2016 01:22:48 PM: > > On Tue, Jan 26, 2016 at 09:21:42AM -0500, Stefan Berger wrote: > > > So, in an ideal world vtpms0 wouldn't even exist. The need for it is > > > more tpm core breakage. You could avoid this by having the tpm core > > > not create sysfs files on the parent for the vtpms. I think we are > > > very close to being able to do that. > > > We're not explicitly registering the parent with sysfs, yet it appears > > there. How do you prevent it from appearing there? > > static struct kobject *get_device_parent(struct device *dev, > struct device *parent) > > /* > * If we have no parent, we live in "virtual". > * Class-devices with a non class-device as parent, live > * in a "glue" directory to prevent namespace collisions. > */ > > The vtpm should be under virtual, the question is if we need vtpm and > and tpm devices, ie should the vtpm driver be calling class_create or > not. > > I don't have an answer to that, are there other virtualization devices > you can reference as an example? I don't have a virtual device example that doesn't have a parent. I had looked at virtio_console for, but that one also has a parent. http://lxr.free-electrons.com/source/drivers/char/virtio_console.c#L1439 > > Ideally you'd get someone familiar with sysfs to confirm these details. > > > The only other solution I could think of is being able to pass a > > NULL for 'dev' into the existing tpmm_chip_alloc() and avoid the > > parent device altogether. > > This would drop the vtpm and put tpm directly under virtual, however I > think the tpm core currently will segfault if parent is null, that > could be fixed if that is the way this should go. Likely the parent isn't needed if the vtpm driver calls it. So a couple of checks for dev being NULL could solve that. But I rather leave it as is for now before I tear code out that I may have to re-add later on. > > >> put_device should be last because it can kfree(vtpm_dev) > > Yes. Fixed. > > Not on your github... > > +static int vtpm_fops_release(struct inode *inode, struct file *filp) > +{ > + struct vtpm_dev *vtpm_dev = filp->private_data; > + > + filp->private_data = NULL; > + > + put_device(&vtpm_dev->dev); > + > + vtpm_delete_device_pair(vtpm_dev); > + > + return 0; > +} > > Don't put something then continue to reference it after. > > Also, I would probably drop the get_device in vtpm_fops_open, use the > implicit get_device of device_initialize instead. Then drop the > put_device in vtpm_destroy_vtpm_dev and use only one at the end of the > fops release. We have the implicit get_device in vtpm_create_vtpm_dev and would therefore try to keep the put_device in vtpm_destroy_vtpm_dev. Besides that a device_initialize + device_add = device_register and device_del + put_device = device_unregister, which could go into vtpm_create_vtpm_dev and vtpm_destroy_vtpm_dev respectively. > > So the flow is, device_initialize [...] create a flip, transfer the > kref from device_initialize to the flip [..] > > Which makes the error flow very sane, once the flip is created, > deleting the flip will clean up everything. > > > I put the fixed version of the driver here: > > https://github.com/stefanberger/linux/commits/vtpm-driver.v2 > > Please go through all the error handling, it still looks wrong: > > +static long vtpmx_fops_ioctl(struct file *f, unsigned int ioctl, > + unsigned long arg) > + vtpm_dev = vtpm_create_device_pair(&vtpm_new_pair); > [..] > + if (copy_to_user(vtpm_new_pair_p, &vtpm_new_pair, > + sizeof(vtpm_new_pair))) > + return -EFAULT; > > Can't be right, that fails the ioctl but still creates the FD and TPM. True. I have not been able to undo fd_install so far and cannot find any other code that does that. So, I pulled that out and put it after the successful copy_to_user(). > > The goto unwind of vtpm_create_device_pair looks wrong too, double > calls vtpm_destroy_vtpm_dev and other. See above on how the flip > release should be relied upon to do that once the flip is created. > > Also I'm wondering about the placement of tpm_chip_register, long > term we are going to be moving tpm setup commands under > tpm_chip_register, so the driver must be fully ready to handle > commands when it is called, so I think vtpm calls it too early. re-ordered > > For instance, get_timeouts is never called in vtpm, which is not > correct. The soft tpm should have an opportunity to tell the kernel > the timeout values. That's seems a chick-and-egg problem. The vTPM can only be started once the ioctl has provided the fd. > > I think you have to fix this by calling tpm_get_timeouts/chip_register > from a work queue thread created before the ioctl returns. That should > avoid deadlock.. We'll only ever be able to call that once the vTPM has been started on the fd... otherwise the default timeout values will have to do it. > > This is also very wrong: > > +struct vtpm_dev { > + struct kref kref; > + > + struct device dev; > > Structs should only ever have one kref. Get rid of 'kref' and use the > kref embedded in 'dev' via get_device/put_device. Yes, sir. > > Also, what is the point of 'vtpm_dev_get_by_chip' ? Just store the > vtpm_dev in the chip's private data pointer like every other driver. Introduced chip->priv. > > No special locking is needed, tpm core guarentees no op is running or > will ever be called again once tpm_unregister returns. > > Jason > cheers! Stefan ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Tue, Jan 26, 2016 at 10:58:16AM -0700, Jason Gunthorpe wrote: > On Tue, Jan 26, 2016 at 05:56:59AM -0800, Jarkko Sakkinen wrote: > > On Mon, Jan 25, 2016 at 08:19:19PM -0700, Jason Gunthorpe wrote: > > > On Mon, Jan 25, 2016 at 05:46:52PM -0800, Jarkko Sakkinen wrote: > > > > > The only issue is error unwind, the tpmm version assumes devm works > > > > > for that, but the vtpm_dev will not be destroyed if the tpm attach > > > > > fails, do devm is not appropriate. > > > > > > > > At the moment tpmm_chip_alloc() does not use devm for anything. Are you > > > > speaking about stuff that happens between alloc and register as some of > > > > the drivers use devm to associate resources to pdev at that point? > > > > > > Ugh, well I missed that when you made those patches. > > > > > > You need to put the devm back ASAP, as it is all the drivers now have > > > broken error handling. eg > > > > > > static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info, > > > acpi_handle acpi_dev_handle) > > > [..] > > > chip = tpmm_chip_alloc(dev, &tpm_tis); > > > [..] > > > if (IS_ERR(chip)) > > > return PTR_ERR(chip); > > > > > > Obviously leaks chip. > > > > Could you describe how it leaks a chip and what do you mean by verb > > "leak"? > > It looks like you alread figured this out. release is never called, so > the kalloc for chip is never freed. > > > > The function is called tpmm_ specifically because it is supposed to > > > use devm resource unwind, it was a big mistake to remove the devm and > > > not rename the function or update the comments. > > > > The function does not reserve any resources using devm. Should it? > > It originally did, but commit 313d21eeab9282e01fdcecd40e9ca87e0953627f > took it out and didn't change the function name. > > The original version looked like: > > struct tpm_chip *tpmm_chip_alloc(struct device *dev, > const struct tpm_class_ops *ops) > { > [..] > devm_add_action(dev, tpmm_chip_remove, chip); > dev_set_drvdata(dev, chip); > > return chip; > } > > > > With the change to use a struct device the devm action should have > > > been a device_put to pair with the device_initialize. > > > > Well, I found a regression that is luckily a rare one. > > It isn't rare, any error path will cause this. > > > So now I do have regression that does show up. If you don't have it, > > it does not make sense to "fix" the code even if it looks obviously > > wrong. > > That isn't the kernel way, this is very obviosly a bug, I don't need > to see 'proof' to know it needs fixing. > > > I have to ask one question. In earlier response to you stated that one > > should get rid of devm? How'd you fix this problem without devm and why > > you want to get rid of it in this scenario? > > To get rid of devm each driver would have to be updated to have the > right goto-error unwind to device_put the chip. It is alot of tricky > work to do this, which is why I suggested the tpmm_ scheme when you > did the original two step alloc patches. > > For drivers like Stefan's non-devm might be needed, I'm still not > clear on that to be honest. There are rules when devm can be used and > not, and using it outside a probe() function or similar is something > I'm not sure works.. OK, thanks for your comments! The important thing is that we are on the same page in this and rest is cutting hairs. I'll deliver a bug fix for this. > Jason /Jarkko ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Wed, Jan 20, 2016 at 09:39:09AM -0500, Stefan Berger wrote: > > Presumably some namespace magic can be used to make them show up as > > tpm0 in a container? > > The magic is to have the container management stack create the device pair. > From the ioctl it learns the name of the devices that were created and it then > finds out about the major/minor number of the created device and have /dev/tpm0 > with that major/minor created in the container's /dev directory. Is the device created before container launched? I would assume that this would work user space accesses through /dev/tpm0. I don't know how this would work for kernel clients. /Jarkko ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Wed, Jan 20, 2016 at 10:01:38PM -0500, Stefan Berger wrote: > > Except that isn't good enough - the IMA kernel side doesn't know that this > > tpm is now acting as the 'main' 'default' TPM. > > Hooking the vTPM to IMA requires another patch that I haven't shown since IMA > namespacing isn't public yet. Basically we implement another ioctl() that is to > be called before the clone() in order to 'reserve' a vtpm device pair for the > calling process. During the clone() call IMA namespacing code can query the > vTPM driver for a 'reserved' device pair. Hooking IMA up after the clone() may > also work but in case of docker/golang it's better to do this before since the > language libraries do a lot after the clone automatically. Can we expect that "in the end" there will be a single patch set that contains both TPM and IMA changes? Otherwise, I see it very hard to make decision to apply TPM patches. /Jarkko ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Thu, Jan 21, 2016 at 03:10:40PM -0700, Jason Gunthorpe wrote: > On Thu, Jan 21, 2016 at 04:51:06PM -0500, Stefan Berger wrote: > > > > You can't let this influence the kernel UAPI design. > > > The choice is between getting this working 'today' (even if just > > locally) or discussing this with golang designers, which in the ideal > > case would cause me waiting for the next version, dealing with that > > version dependency etc., plus the delay. So, clearly, an additional > > ioctl() and ~50 lines of code make this work 'now'. Doesn't this seem > > worth it? > > Sorry, for mainline stuff like this reserve thing is not clean enough > to be acceptable. > > Why can't you just open-code a modified forkAndExecInChild in docker? > > Seriously, the golang code you showed already has special stuff to do > user namespaces before the exec, it is totally unreasonable insist I'm starting to be along the same lines with the Jason now that I've actually tried to read through the discussion up to this email. I did say something opposite in last week but will take my words back since I had too weak understanding back then (you *never* should response to dicussions like this when you are overloaded with something else). I haven't yet locked whether the fd or major/minor passing is the right choice but at least I do agree that we cannot make decission based on some golang code in Docker. For the new mainline code we should pick the most robust API, nothing more or less... /Jarkko ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote on 01/26/2016 09:36:03 PM: > > On Wed, Jan 20, 2016 at 09:39:09AM -0500, Stefan Berger wrote: > > > Presumably some namespace magic can be used to make them show up as > > > tpm0 in a container? > > > > The magic is to have the container management stack create the device pair. > > From the ioctl it learns the name of the devices that were created > and it then > > finds out about the major/minor number of the created device and > have /dev/tpm0 > > with that major/minor created in the container's /dev directory. > > Is the device created before container launched? I would assume that > this would work user space accesses through /dev/tpm0. Yes, device would be created before container is launched. > > I don't know how this would work for kernel clients. For IMA we have these additional ioctls to either 'reserve' a vTPM for a container before clone() or to hook the vTPM up to a IMA namespace after clone() -- you may have read the discussions about these in other emails. As for trusted and encrypted keys and the TPM based RNG, the kernel determines the current IMA namespace a process that wants to use the kernel service is associated with. It then uses the TPM associated with the IMA namespace or returns an error if there is none. Stefan > > /Jarkko > ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote on 01/26/2016 09:50:57 PM: > > On Wed, Jan 20, 2016 at 10:01:38PM -0500, Stefan Berger wrote: > > > Except that isn't good enough - the IMA kernel side doesn't knowthat this > > > tpm is now acting as the 'main' 'default' TPM. > > > > Hooking the vTPM to IMA requires another patch that I haven't > shown since IMA > > namespacing isn't public yet. Basically we implement another ioctl > () that is to > > be called before the clone() in order to 'reserve' a vtpm device > pair for the > > calling process. During the clone() call IMA namespacing code can query the > > vTPM driver for a 'reserved' device pair. Hooking IMA up after the > clone() may > > also work but in case of docker/golang it's better to do this > before since the > > language libraries do a lot after the clone automatically. > > Can we expect that "in the end" there will be a single patch set that > contains both TPM and IMA changes? Otherwise, I see it very hard to make > decision to apply TPM patches. If this can be posted to the same lists, then 'yes'. I cannot set a timeframe for this, though. Nevertheless, the vTPM driver reviews were fruitful, I think. The vTPM driver could be used standalone as well, though it may be more useful in conjunction with the namespacing of IMA. Stefan > > /Jarkko > ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote on 01/26/2016 10:13:20 PM: > > On Thu, Jan 21, 2016 at 03:10:40PM -0700, Jason Gunthorpe wrote: > > On Thu, Jan 21, 2016 at 04:51:06PM -0500, Stefan Berger wrote: > > > > > > You can't let this influence the kernel UAPI design. > > > > > The choice is between getting this working 'today' (even if just > > > locally) or discussing this with golang designers, which in the ideal > > > case would cause me waiting for the next version, dealing with that > > > version dependency etc., plus the delay. So, clearly, an additional > > > ioctl() and ~50 lines of code make this work 'now'. Doesn't this seem > > > worth it? > > > > Sorry, for mainline stuff like this reserve thing is not clean enough > > to be acceptable. > > > > Why can't you just open-code a modified forkAndExecInChild in docker? > > > > Seriously, the golang code you showed already has special stuff to do > > user namespaces before the exec, it is totally unreasonable insist > > I'm starting to be along the same lines with the Jason now that I've > actually tried to read through the discussion up to this email. I did > say something opposite in last week but will take my words back since I > had too weak understanding back then (you *never* should response to > dicussions like this when you are overloaded with something else). > > I haven't yet locked whether the fd or major/minor passing is the right > choice but at least I do agree that we cannot make decission based on > some golang code in Docker. For the new mainline code we should pick the > most robust API, nothing more or less... So we can design it how it should work from user space. So far it's solved with an ioctl. To pick up on Jason's suggestion to look into network namespaces and associated tools: What we don't want from the IMA perspective is that someone creates an IMA namespace on the host similar to how one can create a network namespace (ip netns add ...), runs (malicious) programs under a different IMA policy, and then closes that IMA namespace. On the other hand, what we want are containers having their own IMA namespace with an independent policy. Since the argument is that containers are well isolated from the host and programs running there don't influence the host, their logs would be independent. The thing is that containers are made up of multiple namespaces. So, we likely won't give direct control over IMA namespace creation through tools like network namespacing does or even a dedicate clone flag but connect it to one or multiple existing clone flags that cause creation of a (mount, pid, etc.) namespace. So that could mean that if someone creates a mount + pid namespace, and thus is well isolated, he automatically gets an IMA namespace. IMA namespacing would then provide less control for users compared to network namespacing and therefore an ioctl, issued from the clone()ing process, for IMA+vTPM driver hook-up would be sufficient. Stefan > > /Jarkko > ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Wed, Jan 27, 2016 at 07:17:17AM -0500, Stefan Berger wrote: > Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote on 01/26/2016 09:36:03 > PM: > > > > > On Wed, Jan 20, 2016 at 09:39:09AM -0500, Stefan Berger wrote: > > > > Presumably some namespace magic can be used to make them show up as > > > > tpm0 in a container? > > > > > > The magic is to have the container management stack create the device pair. > > > From the ioctl it learns the name of the devices that were created > > and it then > > > finds out about the major/minor number of the created device and > > have /dev/tpm0 > > > with that major/minor created in the container's /dev directory. > > > > Is the device created before container launched? I would assume that > > this would work user space accesses through /dev/tpm0. > > Yes, device would be created before container is launched. > > > > > I don't know how this would work for kernel clients. > > For IMA we have these additional ioctls to either 'reserve' a vTPM for a > container before clone() or to hook the vTPM up to a IMA namespace after clone > () -- you may have read the discussions about these in other emails. As for > trusted and encrypted keys and the TPM based RNG, the kernel determines the > current IMA namespace a process that wants to use the kernel service is > associated with. It then uses the TPM associated with the IMA namespace or > returns an error if there is none. Yeah, I now have read the discussion but I still don't fully understand this. I spent 2-3 hours yesterday reading it and frankly don't want do it again. If I ask something that was already hidden somewhere there, I just didn't get it. If this applies to other kernel services than IMA, why this feature is called IMA namespace? I.e. as far as I can understand your description you could: 1. Use this feature without using IMA. Create namespace and apply TPM emulator or software implementation of TPM (you could do this for example with SGX). 2. Use this feature without TPM. Just create namespace and use IMA there. "TPM namespace" that popped out in your and Jaosns discussion and also "IMA namespace" both seem not to describe what is being developed. Also, I'm wondering is it right to have this as a separate module or should this be part of the core TPM infrastructure? /Jarkko ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Wed, Jan 27, 2016 at 07:20:48AM -0500, Stefan Berger wrote: > Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote on 01/26/2016 09:50:57 > PM: > > > > > > On Wed, Jan 20, 2016 at 10:01:38PM -0500, Stefan Berger wrote: > > > > Except that isn't good enough - the IMA kernel side doesn't knowthat this > > > > tpm is now acting as the 'main' 'default' TPM. > > > > > > Hooking the vTPM to IMA requires another patch that I haven't > > shown since IMA > > > namespacing isn't public yet. Basically we implement another ioctl > > () that is to > > > be called before the clone() in order to 'reserve' a vtpm device > > pair for the > > > calling process. During the clone() call IMA namespacing code can query the > > > vTPM driver for a 'reserved' device pair. Hooking IMA up after the > > clone() may > > > also work but in case of docker/golang it's better to do this > > before since the > > > language libraries do a lot after the clone automatically. > > > > Can we expect that "in the end" there will be a single patch set that > > contains both TPM and IMA changes? Otherwise, I see it very hard to make > > decision to apply TPM patches. > > If this can be posted to the same lists, then 'yes'. I cannot set a timeframe > for this, though. Nevertheless, the vTPM driver reviews were fruitful, I think. Thanks, this is good to hear. > The vTPM driver could be used standalone as well, though it may be more useful > in conjunction with the namespacing of IMA. > > Stefan /Jarkko ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Wed, Jan 27, 2016 at 07:17:17AM -0500, Stefan Berger wrote: > after clone() -- you may have read the discussions about these in other > emails. As for trusted and encrypted keys and the TPM based RNG, > the Oh that is a terrifying thought, the kernel should never use a vtpm RNG :| Jason ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Wed, Jan 27, 2016 at 07:42:17AM -0500, Stefan Berger wrote: > What we don't want from the IMA perspective is that someone creates an > IMA namespace on the host similar to how one can create a network > namespace (ip netns add ...), runs (malicious) programs under a > different IMA policy, and then closes that IMA namespace. Presumably who ever is implementing IMA namespaces has some kind of solution for this though? That seems like a very difficult problem. > hand, what we want are containers having their own IMA namespace with > an independent policy. Since the argument is that containers are well > isolated from the host and programs running there don't influence the > host, their logs would be independent. The thing is that containers are > made up of multiple namespaces. So, we likely won't give direct control > over IMA namespace creation through tools like network namespacing does > or even a dedicate clone flag but connect it to one or multiple > existing clone flags that cause creation of a (mount, pid, etc.) > namespace. So that could mean that if someone creates a mount + pid > namespace, and thus is well isolated, he automatically gets an IMA > namespace. That isn't nearly good enough, mount namespaces don't mean you are isolated. It isn't until another tool like docker goes through and alters the child's mount table that some degree of isolation is achieved.. I don't think there is a generic kernel side point where it could tell the child is isolated enough. Whatever that means. Doesn't selinux have the exact same problem? How does selinux handle namespaces? > IMA namespacing would then provide less control for users compared to > network namespacing and therefore an ioctl, issued from the clone()ing > process, for IMA+vTPM driver hook-up would be sufficient. I agree IMA namespaces probably don't need the whole 'ip setns' type interface, but realistically if an ioctl is provided to install a TPM in a child namespace it should let anyone with privilege in the parent namespaces install a TPM in a child IMA namespace, that is just how the namespace APIs seem to work. That said, maybe looking at selinux namespaces interaction will give a different idea.. Jason ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Tue, Jan 26, 2016 at 06:22:46PM -0500, Stefan Berger wrote: > > I don't have an answer to that, are there other virtualization devices > > you can reference as an example? > I don't have a virtual device example that doesn't have a parent. I had > looked at virtio_console for, but that one also has a parent. > [1]http://lxr.free-electrons.com/source/drivers/char/virtio_console.c#L1439 virtio console is really a 'hardware' device under a VM setting, so it's parent make some sense. Are you aware of any devices used directly with containers? I only know of netdevices, and I'm not sure that is relevant. > > Also, I would probably drop the get_device in vtpm_fops_open, use the > > implicit get_device of device_initialize instead. Then drop the > > put_device in vtpm_destroy_vtpm_dev and use only one at the end of the > > fops release. > We have the implicit get_device in vtpm_create_vtpm_dev and would > therefore try to keep the put_device in vtpm_destroy_vtpm_dev. No, go back and re-read my last message. vtpm_destroy_vtpm_dev shouldn't even exist, evey call to it looks kinda weird.. vtpm_create_device_pair creates a flip and deleting the flip should delete everything else. The only time there are special cases are within vtpm_create_vtpm_dev itself, which should be handled in-line with goto-unwinds. Return the struct file from vtpm_create_device_pair not the struct vtpm_dev to make this clear. 'vtpm_delete_device_pair' should just put the flip, and all that clean up should be in the flip release. Doing any clean up while the file is still active just creates complex scenarios for no real reason. I'd also drop vtpm_cleanup, it doesn't look right to me... Ie this doesn't make any sense: + vtpm_dev->file->f_op = &null_fops; null_fops is still a pointer within this module, so the module still cannot be unloaded. If the module cannot be unloaded there is no reason to have vtpm_cleanup. I don't know if there is a solution to that, but generally speaking module detach is hard, racy, and best avoided. In a case like this killing all the vtpm daemons will allow module unload, so I'd just go with that much simpler solution. Just ensure the module lock is held while any vtpms exist. I actually thought the file code did this for the module, so I'm susprised that vtpm_cleanup would even be callable at all? Once all that is done then the flip alone controls the lifetime of the rest of the objects and it is very easy to reason about. > True. I have not been able to undo fd_install so far and cannot find > any other code that does that. So, I pulled that out and put it after > the successful copy_to_user(). That looks much better, yes. > > For instance, get_timeouts is never called in vtpm, which is not > > correct. The soft tpm should have an opportunity to tell the kernel > > the timeout values. > That's seems a chick-and-egg problem. The vTPM can only be started once > the ioctl has provided the fd. Hence the work queue suggestion. Start a work queue task/kernel thread/something right before fd_install that does tpm_get_timeouts and tpm_register. The ioctl will return and immediately read() on the fd will be available for get_timeouts. The user space side must process the read and return a response within the default tpm timeout period. So, userspace should not call the ioctl to register the tpm until it is completely prepared to process tpm commands and it should immediately process commands once the ioctl returns. Jason ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Wed, Jan 27, 2016 at 06:22:39AM -0800, Jarkko Sakkinen wrote: > Also, I'm wondering is it right to have this as a separate module or > should this be part of the core TPM infrastructure? The vtpm itself should be a module. Any namespace support for TPM (whatever form that takes) should be in the core infrastructure and apply equally to all TPMs. I do not like the idea that only vtpm could participate in namespace support. Jason ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Tue, Jan 26, 2016 at 06:06:17PM -0800, Jarkko Sakkinen wrote: > On Tue, Jan 26, 2016 at 10:58:16AM -0700, Jason Gunthorpe wrote: > > On Tue, Jan 26, 2016 at 05:56:59AM -0800, Jarkko Sakkinen wrote: > > > On Mon, Jan 25, 2016 at 08:19:19PM -0700, Jason Gunthorpe wrote: > > > > On Mon, Jan 25, 2016 at 05:46:52PM -0800, Jarkko Sakkinen wrote: > > > > > > The only issue is error unwind, the tpmm version assumes devm works > > > > > > for that, but the vtpm_dev will not be destroyed if the tpm attach > > > > > > fails, do devm is not appropriate. > > > > > > > > > > At the moment tpmm_chip_alloc() does not use devm for anything. Are you > > > > > speaking about stuff that happens between alloc and register as some of > > > > > the drivers use devm to associate resources to pdev at that point? > > > > > > > > Ugh, well I missed that when you made those patches. > > > > > > > > You need to put the devm back ASAP, as it is all the drivers now have > > > > broken error handling. eg > > > > > > > > static int tpm_tis_init(struct device *dev, struct tpm_info *tpm_info, > > > > acpi_handle acpi_dev_handle) > > > > [..] > > > > chip = tpmm_chip_alloc(dev, &tpm_tis); > > > > [..] > > > > if (IS_ERR(chip)) > > > > return PTR_ERR(chip); > > > > > > > > Obviously leaks chip. > > > > > > Could you describe how it leaks a chip and what do you mean by verb > > > "leak"? > > > > It looks like you alread figured this out. release is never called, so > > the kalloc for chip is never freed. > > > > > > The function is called tpmm_ specifically because it is supposed to > > > > use devm resource unwind, it was a big mistake to remove the devm and > > > > not rename the function or update the comments. > > > > > > The function does not reserve any resources using devm. Should it? > > > > It originally did, but commit 313d21eeab9282e01fdcecd40e9ca87e0953627f > > took it out and didn't change the function name. > > > > The original version looked like: > > > > struct tpm_chip *tpmm_chip_alloc(struct device *dev, > > const struct tpm_class_ops *ops) > > { > > [..] > > devm_add_action(dev, tpmm_chip_remove, chip); > > dev_set_drvdata(dev, chip); > > > > return chip; > > } > > > > > > With the change to use a struct device the devm action should have > > > > been a device_put to pair with the device_initialize. > > > > > > Well, I found a regression that is luckily a rare one. > > > > It isn't rare, any error path will cause this. > > > > > So now I do have regression that does show up. If you don't have it, > > > it does not make sense to "fix" the code even if it looks obviously > > > wrong. > > > > That isn't the kernel way, this is very obviosly a bug, I don't need > > to see 'proof' to know it needs fixing. I have to defend my position here. Of course you have to proof in a way or another that there is a regresssion. Otherwise you cannot speak that there is a bug. It's by definition like that. I need to see the regression in order to apply a bug fix to the tpmdd tree. /Jarkko ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Wed, Jan 27, 2016 at 11:24:48AM -0700, Jason Gunthorpe wrote: > On Wed, Jan 27, 2016 at 06:22:39AM -0800, Jarkko Sakkinen wrote: > > > Also, I'm wondering is it right to have this as a separate module or > > should this be part of the core TPM infrastructure? > > The vtpm itself should be a module. > > Any namespace support for TPM (whatever form that takes) should be in > the core infrastructure and apply equally to all TPMs. > > I do not like the idea that only vtpm could participate in namespace > support. Given your discussion I already see it participating to the namespace support. > Jason /Jarkko ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 01/27/2016 12:58:39 PM: > > On Wed, Jan 27, 2016 at 07:42:17AM -0500, Stefan Berger wrote: > > What we don't want from the IMA perspective is that someone creates an > > IMA namespace on the host similar to how one can create a network > > namespace (ip netns add ...), runs (malicious) programs under a > > different IMA policy, and then closes that IMA namespace. > > Presumably who ever is implementing IMA namespaces has some kind of > solution for this though? > > That seems like a very difficult problem. > > > hand, what we want are containers having their own IMA namespace with > > an independent policy. Since the argument is that containers are well > > isolated from the host and programs running there don't influence the > > host, their logs would be independent. The thing is that containers are > > made up of multiple namespaces. So, we likely won't give direct control > > over IMA namespace creation through tools like network namespacing does > > or even a dedicate clone flag but connect it to one or multiple > > existing clone flags that cause creation of a (mount, pid, etc.) > > namespace. So that could mean that if someone creates a mount + pid > > namespace, and thus is well isolated, he automatically gets an IMA > > namespace. > > That isn't nearly good enough, mount namespaces don't mean you are > isolated. It isn't until another tool like docker goes through and > alters the child's mount table that some degree of isolation is > achieved.. > > I don't think there is a generic kernel side point where it could tell > the child is isolated enough. Whatever that means. I agree. Which set of namespaces is enough for running any program in this set of namespaces (aka container) and being able to forget about the list of measurements once the set of namespaces goes away because whatever ran there couldn't have influenced the host. I would say mount, network, and user namespace is such a set. Probably also include PID namespace in that space assuming that in a shared PID namespace one process could signal other processes. Though this may be mitigated with user namespace mapping etc.. Not clear about UTS namespace, but may not be so important. To be on the safe side, maybe all would be required and one gets an IMA namespace only then. > > Doesn't selinux have the exact same problem? How does selinux handle > namespaces? They solve it by mounting with a context option, which enforces an sVirt SELinux label across all files that the container user then cannot change. tmpfs /dev tmpfs rw,context= "system_u:object_r:svirt_sandbox_file_t:s0:c322,c860",nosuid,mode=755 0 0 devpts /dev/pts devpts rw,context= "system_u:object_r:svirt_sandbox_file_t:s0:c322,c860 ",nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666 0 0 shm /dev/shm tmpfs rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c322,c860",nosuid,nodev,noexec,relatime,size=65536k 0 0 tmpfs /sys/fs/cgroup tmpfs rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c322,c860",nosuid,nodev,noexec,relatime 0 0 tmpfs /run/secrets tmpfs rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c322,c860",nosuid,nodev,noexec,relatime 0 0 tmpfs /proc/kcore tmpfs rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c322,c860",nosuid,mode=755 0 0 tmpfs /proc/latency_stats tmpfs rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c322,c860",nosuid,mode=755 0 0 tmpfs /proc/timer_stats tmpfs rw,context="system_u:object_r:svirt_sandbox_file_t:s0:c322,c860",nosuid,mode=755 0 0 > > > IMA namespacing would then provide less control for users compared to > > network namespacing and therefore an ioctl, issued from the clone()ing > > process, for IMA+vTPM driver hook-up would be sufficient. > > I agree IMA namespaces probably don't need the whole 'ip setns' type > interface, but realistically if an ioctl is provided to install a TPM > in a child namespace it should let anyone with privilege in the parent > namespaces install a TPM in a child IMA namespace, that is just how > the namespace APIs seem to work. Agree. > > That said, maybe looking at selinux namespaces interaction will give a > different idea.. See above. We cannot use the same trick. Stefan > > Jason > ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Wed, Jan 27, 2016 at 04:58:51PM -0500, Stefan Berger wrote: > > I don't think there is a generic kernel side point where it could tell > > the child is isolated enough. Whatever that means. > I agree. Which set of namespaces is enough for running any program in > this set of namespaces (aka container) and being able to forget > about It isn't just the presense of namespaces that matter, eg a net mount namespace does not mean access is denied to the parent namespace, net namespaces don't mean devices are isolated, etc. This is not a good direction to go, access to an IMA namespace needs to be very strongly controlled, 'enough namespaces' is not a sufficient criteria! Any flaw in the access criteria immediately destroys the security of IMA in non-container contexts, so this needs to be done very carefully. > > Doesn't selinux have the exact same problem? How does selinux handle > > namespaces? > They solve it by mounting with a context option, which enforces an > sVirt SELinux label across all files that the container user then > cannot change. This sounds very sane. > > That said, maybe looking at selinux namespaces interaction will give a > > different idea.. > See above. We cannot use the same trick. Hmm, well, it certainly seems to be a lot of what is required, and like a much better direction than trying to use namespaces. Arranging for an IMA namespace to only exists in association with a SELinux label - and then rely on SELinux to provide the necessary security isolation instead of trying to do the same thing with namespaces sounds more likely to succeed.. Jason ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 01/27/2016 01:24:48 PM: > Subject: Re: [tpmdd-devel] [RFC PATCH 3/4] Implement driver for > supporting multiple emulated TPMs > > On Wed, Jan 27, 2016 at 06:22:39AM -0800, Jarkko Sakkinen wrote: > > > Also, I'm wondering is it right to have this as a separate module or > > should this be part of the core TPM infrastructure? > > The vtpm itself should be a module. > > Any namespace support for TPM (whatever form that takes) should be in > the core infrastructure and apply equally to all TPMs. > > I do not like the idea that only vtpm could participate in namespace > support. How would the usage model of the TPM shared by multiple namespaces look like? What subset of functionality is available to each namespace and what works and what doesn't? Again, how is the sharing of PCRs accomplished? I think we should treat this sharing of the hardware TPM discussion orthogonal to the vTPM so that we don't get hung up with it. Stefan > > Jason > ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 01/27/2016 05:25:34 PM: > > On Wed, Jan 27, 2016 at 04:58:51PM -0500, Stefan Berger wrote: > > > > I don't think there is a generic kernel side point where it could tell > > > the child is isolated enough. Whatever that means. > > > I agree. Which set of namespaces is enough for running any program in > > this set of namespaces (aka container) and being able to forget > > about > > It isn't just the presense of namespaces that matter, eg a net mount > namespace does not mean access is denied to the parent namespace, net > namespaces don't mean devices are isolated, etc. > > This is not a good direction to go, access to an IMA namespace needs to > be very strongly controlled, 'enough namespaces' is not a sufficient > criteria! Isolation should be a criteria and isolation becomes better with more namespaces enabled. That way one can run any program inside the set of namespaces and not harm the host or any other namespaces / containers. > > Any flaw in the access criteria immediately destroys the security of > IMA in non-container contexts, so this needs to be done very > carefully. > > > > Doesn't selinux have the exact same problem? How does selinux handle > > > namespaces? > > > They solve it by mounting with a context option, which enforces an > > sVirt SELinux label across all files that the container user then > > cannot change. > > This sounds very sane. > > > > That said, maybe looking at selinux namespaces interaction will give a > > > different idea.. > > > See above. We cannot use the same trick. > > Hmm, well, it certainly seems to be a lot of what is required, > and like a much better direction than trying to use namespaces. I don't agree. We want to allow users to run the own IMA appraisal policy. For that files need to be signed and the user's key passed to the IMA namespace. To enable that we need per-file file signatures, not some single label that works across all files in a filesystem. IMA's appraisal mode just doesn't work this way. > > Arranging for an IMA namespace to only exists in association with a > SELinux label - and then rely on SELinux to provide the necessary > security isolation instead of trying to do the same thing with > namespaces sounds more likely to succeed.. I am not sure whether SELinux labeling alone provides enough isolation. And it's likely not just the label that's important but all the rules that go with it to determine what a process can do and what not. How do you even evaluate that from inside the kernel that it's worthy an IMA namespace? Stefan > > Jason > ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
On Wed, Jan 27, 2016 at 05:55:50PM -0500, Stefan Berger wrote: > > be very strongly controlled, 'enough namespaces' is not a sufficient > > criteria! > Isolation should be a criteria and isolation becomes better with more > namespaces enabled. Yes, but IMA needs complete isolation, not just 'better' isolation. > That way one can run any program inside the set of namespaces and > not harm the host or any other namespaces / containers. No, that is not how namespaces work at the kernel level, they don't provide isolation until after userspace configures the isolation features. Or said another way, it is not better than CAP_ADMIN, because anyone with CAP_ADMIN could create namespaces and manipulate them from user space in a way that can substantially defeat IMA. Do not assume 'docker' is the thing doing this. Assume 'hostile-docker' is involved and make a scheme that doesn't totally break IMAs security promise. For instance I could use hostile-docker to create all the namespaces, including IMA, then drop in eth0 and bind mount /, and run software that does something hostile to eth0 - and I think that is totally contry to the guarentee a system running IMA should provide. Specifically that CAP_ADMIN should not be able to opt out of IMA. > > Hmm, well, it certainly seems to be a lot of what is required, > > and like a much better direction than trying to use namespaces. > I don't agree. We want to allow users to run the own IMA appraisal > policy. For that files need to be signed and the user's key passed to > the IMA namespace. To enable that we need per-file file signatures, not > some single label that works across all files in a filesystem. IMA's > appraisal mode just doesn't work this way. Isn't that exactly what SElinux is trying to do? The SElinux problem is 'allow users to run their own SELinux policy'. Just like IMA, SELinux has per-file attributes that have a user security label. IMA has a per-file user signature in the attribute instead. So, for IMA you'd do .. make mount namespace .. mount --bind /.../container/fs -o ima_namespace=foo / After which all accesses to / would use the IMA policy namespace foo, and read the signature from the attribute in the filesystem. I'm not sure it is better than doing it at clone, but it doesn't seem out of line. > I am not sure whether SELinux labeling alone provides enough > isolation. Nor am I, but maybe there is a way forward like that. While with namespaces I can tell you right now, there can be no easy solution that would be better than testing CAP_ADMIN :| > And it's likely not just the label that's important but all the rules > that go with it to determine what a process can do and what not. How do > you even evaluate that from inside the kernel that it's worthy an IMA > namespace? When IMA is turned on user space tells the kernel these details? So the restriction kernel side becomes 1) The user space has told the kernel something about SELinux security labels for use with IMA 2) The mount option contains an IMA namespace and the SELinux label compatible with #1 Then allow the IMA namespace to be used for that mount At least that way it something strong that can be reasoned about.. And is opt-out by default. Anyhow just brainstorming for you.. Jason ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote on 01/25/2016 10:46:32 PM: > > On Mon, Jan 25, 2016 at 08:05:14PM -0500, Stefan Berger wrote: > > > > And confused why there is a miscdev and a alloc_chrdev_region ? > > > > miscdev = /dev/vtpmx - that should be ok, no ? > > Yes > > > So, I just removed alloc_chrdev_region and with that the assignment of a > > major+minor to the virtual device. This works fine on device creation but on > > device destruction something odd happens in sysfs. > > > With two 'vtpmctrl' test programs running sysfs looks like this: > > > > # find /sys/devices/virtual/vtpm/ > > /sys/devices/virtual/vtpm/ > > /sys/devices/virtual/vtpm/vtpms0 > > /sys/devices/virtual/vtpm/vtpms0/dev > > Ugh. I have to dig out the sysfs topic again because I think sysfs seems to be the only remaining part that could be a problem. The problem is information leakage between containers via sysfs due to sysfs entries, at least for TPM 1.2. For TPM 1.2 we have a lot of sysfs entries that show the current state of the TPM. The good news is, there's the bind-mount trick that a container management stack can apply to hide everything under /sys/devices/virtual/vtpm. To achieve that it would do 'mount -o bind [...]/nulldir /sys/device/virtual/vtpm' inside a container and container users wouldn't be able to see those sysfs entries anymore. However, the bind mount trick only works if /sys/devices/virtual/vtpm exists and it only exists if a vTPM has been created, but the first started container won't necessarily have a vtpm, so that directory will not exists, but it will appear in that container once another container has a vtpm. Is there anything we can do in the driver to help user space ? Another trick would be that the container management stack creates a 'dummy vtpm' device pair every time and keeps it around just so it has that sysfs entry and can do the bind-mounting. [mkdir under sysfs doesn't work] Stefan ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig index 3b84a8b..d38b169 100644 --- a/drivers/char/tpm/Kconfig +++ b/drivers/char/tpm/Kconfig @@ -122,5 +122,15 @@ config TCG_CRB from within Linux. To compile this driver as a module, choose M here; the module will be called tpm_crb. +config TCG_VTPM + tristate "VTPM Interface" + depends on TCG_TPM + ---help--- + This driver supports a vTPM running in userspace that receives + request from /dev/vtpmsX. The requests are sent through a + corrsponding /dev/tpmcY. The /dev/vtpmsX and /dev/tpmcY pair + can be created dyanmically through ioctls of /dev/vtpmx. + + source "drivers/char/tpm/st33zp24/Kconfig" endif # TCG_TPM diff --git a/drivers/char/tpm/Makefile b/drivers/char/tpm/Makefile index 56e8f1f..d947db2 100644 --- a/drivers/char/tpm/Makefile +++ b/drivers/char/tpm/Makefile @@ -23,3 +23,4 @@ obj-$(CONFIG_TCG_IBMVTPM) += tpm_ibmvtpm.o obj-$(CONFIG_TCG_TIS_ST33ZP24) += st33zp24/ obj-$(CONFIG_TCG_XEN) += xen-tpmfront.o obj-$(CONFIG_TCG_CRB) += tpm_crb.o +obj-$(CONFIG_TCG_VTPM) += tpm-vtpm.o diff --git a/drivers/char/tpm/tpm-vtpm.c b/drivers/char/tpm/tpm-vtpm.c new file mode 100644 index 0000000..dc6f7f6 --- /dev/null +++ b/drivers/char/tpm/tpm-vtpm.c @@ -0,0 +1,855 @@ +/* + * Copyright (C) 2015 IBM Corporation + * + * Author: Stefan Berger <stefanb@us.ibm.com> + * + * Maintained by: <tpmdd-devel@lists.sourceforge.net> + * + * Device driver for vTPM. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation, version 2 of the + * License. + * + */ + +#include <linux/types.h> +#include <linux/list.h> +#include <linux/spinlock.h> +#include <linux/uaccess.h> +#include <linux/wait.h> +#include <linux/miscdevice.h> +#include <linux/vtpm.h> +#include <linux/poll.h> +#include <asm/compat.h> + +#include "tpm-vtpm.h" + +static DECLARE_BITMAP(dev_mask, VTPM_NUM_DEVICES); +static LIST_HEAD(vtpm_list); +static DEFINE_SPINLOCK(driver_lock); + +static struct class *vtpm_class; +static dev_t vtpm_devt; + +static int _vtpm_delete_device_pair(struct vtpm_dev *vtpm_dev); +static void free_vtpm_dev(struct kref *kref); + +static void vtpm_dev_get(struct vtpm_dev *vtpm_dev) +{ + kref_get(&vtpm_dev->kref); +} + +static void vtpm_dev_put(struct vtpm_dev *vtpm_dev) +{ + if (vtpm_dev) + kref_put(&vtpm_dev->kref, free_vtpm_dev); +} + +static struct vtpm_dev *vtpm_dev_get_by_chip(struct tpm_chip *chip) +{ + struct vtpm_dev *pos, *vtpm_dev = NULL; + + rcu_read_lock(); + + list_for_each_entry_rcu(pos, &vtpm_list, list) { + if (pos->chip == chip) { + vtpm_dev = pos; + vtpm_dev_get(vtpm_dev); + break; + } + } + + rcu_read_unlock(); + + return vtpm_dev; +} + +static struct vtpm_dev *vtpm_dev_get_by_vtpm_devnum(u32 dev_num) +{ + struct vtpm_dev *pos, *vtpm_dev = NULL; + + rcu_read_lock(); + + list_for_each_entry_rcu(pos, &vtpm_list, list) { + if (pos->dev_num == dev_num) { + vtpm_dev = pos; + vtpm_dev_get(vtpm_dev); + break; + } + } + + rcu_read_unlock(); + + return vtpm_dev; +} + +static struct vtpm_dev *vtpm_dev_get_by_tpm_devnum(u32 dev_num) +{ + struct vtpm_dev *pos, *vtpm_dev = NULL; + + rcu_read_lock(); + + list_for_each_entry_rcu(pos, &vtpm_list, list) { + if (pos->chip->dev_num == dev_num) { + vtpm_dev = pos; + vtpm_dev_get(vtpm_dev); + break; + } + } + + rcu_read_unlock(); + + return vtpm_dev; +} + +/* + * Functions related to /dev/vtpms%d + */ + +/** + * vtpm_fops_read - Read TPM commands from /dev/vtpms%d + * + * Return value: + * Number of bytes read or negative error code + */ +static ssize_t vtpm_fops_read(struct file *filp, char __user *buf, + size_t count, loff_t *off) +{ + struct file_priv *priv = filp->private_data; + struct vtpm_dev *vtpm_dev = priv->vtpm_dev; + size_t len; + int sig, rc; + + sig = wait_event_interruptible(vtpm_dev->wq, vtpm_dev->req_len != 0); + if (sig) + return -EINTR; + + len = vtpm_dev->req_len; + + if (count < len) { + dev_err(&vtpm_dev->dev, + "Invalid size in recv: count=%zd, req_len=%zd\n", + count, len); + return -EIO; + } + + spin_lock(&vtpm_dev->buf_lock); + + rc = copy_to_user(buf, vtpm_dev->req_buf, len); + memset(vtpm_dev->req_buf, 0, len); + vtpm_dev->req_len = 0; + + spin_unlock(&vtpm_dev->buf_lock); + + if (rc) + return -EFAULT; + + return len; +} + +/** + * vtpm_fops_write - Write TPM responses to /dev/vtpms%d + * + * Return value: + * Number of bytes read or negative error value + */ +static ssize_t vtpm_fops_write(struct file *filp, const char __user *buf, + size_t count, loff_t *off) +{ + struct file_priv *priv = filp->private_data; + struct vtpm_dev *vtpm_dev = priv->vtpm_dev; + + if (count > sizeof(vtpm_dev->resp_buf)) + return -EIO; + + spin_lock(&vtpm_dev->buf_lock); + + vtpm_dev->req_len = 0; + + if (copy_from_user(vtpm_dev->resp_buf, buf, count)) { + spin_unlock(&vtpm_dev->buf_lock); + return -EFAULT; + } + + vtpm_dev->resp_len = count; + + spin_unlock(&vtpm_dev->buf_lock); + + wake_up_interruptible(&vtpm_dev->wq); + + return count; +} + +/* + * vtpm_fops_poll: Poll status of /dev/vtpms%d + * + * Return value: + * Poll flags + */ +static unsigned int vtpm_fops_poll(struct file *filp, poll_table *wait) +{ + struct file_priv *priv = filp->private_data; + struct vtpm_dev *vtpm_dev = priv->vtpm_dev; + unsigned ret; + + poll_wait(filp, &vtpm_dev->wq, wait); + + ret = POLLOUT; + if (vtpm_dev->req_len) + ret |= POLLIN | POLLRDNORM; + + return ret; +} + +/** + * vtpm_fops_open - Open vTPM device /dev/vtpms%d + * + * Return value: + * 0 on success, error code otherwise + */ +static int vtpm_fops_open(struct inode *inode, struct file *filp) +{ + struct vtpm_dev *vtpm_dev = + container_of(inode->i_cdev, struct vtpm_dev, cdev); + struct file_priv *priv; + + if (test_and_set_bit(STATE_OPENED_BIT, &vtpm_dev->state)) { + dev_dbg(vtpm_dev->pdev, + "Another process owns this vTPM device\n"); + return -EBUSY; + } + + priv = kzalloc(sizeof(*priv), GFP_KERNEL); + if (priv == NULL) { + clear_bit(STATE_OPENED_BIT, &vtpm_dev->state); + return -ENOMEM; + } + + priv->vtpm_dev = vtpm_dev; + + get_device(vtpm_dev->pdev); + + filp->private_data = priv; + + return 0; +} + +/* + * vtpm_fops_release: Close /dev/vtpms%d + * + * If device pair is not in use anymore and flags permit, delete + * the device pair. + * + * Return value: + * Always returns 0. + */ +static int vtpm_fops_release(struct inode *inode, struct file *filp) +{ + struct file_priv *priv = filp->private_data; + struct vtpm_dev *vtpm_dev = priv->vtpm_dev; + + filp->private_data = NULL; + put_device(vtpm_dev->pdev); + kfree(priv); + + if (!(vtpm_dev->flags & VTPM_FLAG_KEEP_DEVPAIR)) { + /* + * device still marked as open; this prevents others from + * opening it while we try to delete it + */ + if (_vtpm_delete_device_pair(vtpm_dev) == 0) { + /* vtpm_dev gone */ + return 0; + } + } + + clear_bit(STATE_OPENED_BIT, &vtpm_dev->state); + /* won't generate reponses anymore */ + wake_up_interruptible(&vtpm_dev->wq); + + return 0; +} + +static const struct file_operations vtpm_fops = { + .owner = THIS_MODULE, + .llseek = no_llseek, + .open = vtpm_fops_open, + .read = vtpm_fops_read, + .write = vtpm_fops_write, + .poll = vtpm_fops_poll, + .release = vtpm_fops_release, +}; + +/* + * Functions invoked by the core TPM driver to send TPM commands to + * /dev/vtpms%d and receive responses from there. + */ + +/* + * Called when core TPM driver reads TPM responses from /dev/vtpms%d. + * + * Return value: + * Number of TPM response bytes read, negative error value otherwise + */ +static int vtpm_tpm_op_recv(struct tpm_chip *chip, u8 *buf, size_t count) +{ + struct vtpm_dev *vtpm_dev = vtpm_dev_get_by_chip(chip); + int sig; + size_t len; + + if (!vtpm_dev) + return -EIO; + + /* wait for response or responder gone */ + sig = wait_event_interruptible(vtpm_dev->wq, + (vtpm_dev->resp_len != 0 + || !test_bit(STATE_OPENED_BIT, &vtpm_dev->state))); + if (sig) { + len = -EINTR; + goto err_exit; + } + + /* process gone ? */ + if (!test_bit(STATE_OPENED_BIT, &vtpm_dev->state)) + return -EIO; + + len = vtpm_dev->resp_len; + if (count < len) { + dev_err(&vtpm_dev->dev, + "Invalid size in recv: count=%zd, resp_len=%zd\n", + count, len); + len = -EIO; + goto err_exit; + } + + spin_lock(&vtpm_dev->buf_lock); + + memcpy(buf, vtpm_dev->resp_buf, len); + vtpm_dev->resp_len = 0; + + spin_unlock(&vtpm_dev->buf_lock); + +err_exit: + vtpm_dev_put(vtpm_dev); + + return len; +} + +/* + * Called when core TPM driver forwards TPM requests to /dev/vtpms%d. + * + * Return value: + * 0 in case of success, negative error value otherwise. + */ +static int vtpm_tpm_op_send(struct tpm_chip *chip, u8 *buf, size_t count) +{ + struct vtpm_dev *vtpm_dev = vtpm_dev_get_by_chip(chip); + int rc = 0; + + if (!vtpm_dev) + return -EIO; + + if (!test_bit(STATE_OPENED_BIT, &vtpm_dev->state)) { + rc = -EINVAL; + goto err_exit; + } + + if (count > sizeof(vtpm_dev->req_buf)) { + dev_err(&vtpm_dev->dev, + "Invalid size in send: count=%zd, buffer size=%zd\n", + count, sizeof(vtpm_dev->req_buf)); + rc = -EIO; + goto err_exit; + } + + spin_lock(&vtpm_dev->buf_lock); + + vtpm_dev->resp_len = 0; + + vtpm_dev->req_len = count; + memcpy(vtpm_dev->req_buf, buf, count); + + spin_unlock(&vtpm_dev->buf_lock); + + wake_up_interruptible(&vtpm_dev->wq); + +err_exit: + vtpm_dev_put(vtpm_dev); + + return rc; +} + +static void vtpm_tpm_op_cancel(struct tpm_chip *chip) +{ + /* not supported */ +} + +static u8 vtpm_tpm_op_status(struct tpm_chip *chip) +{ + return 0; +} + +static bool vtpm_tpm_req_canceled(struct tpm_chip *chip, u8 status) +{ + return (status == 0); +} + +static const struct tpm_class_ops vtpm_tpm_ops = { + .recv = vtpm_tpm_op_recv, + .send = vtpm_tpm_op_send, + .cancel = vtpm_tpm_op_cancel, + .status = vtpm_tpm_op_status, + .req_complete_mask = 0, + .req_complete_val = 0, + .req_canceled = vtpm_tpm_req_canceled, +}; + +/* + * Code related to creation and deletion of device pairs + */ + +static void vtpm_dev_release(struct device *dev) +{ + struct vtpm_dev *vtpm_dev = container_of(dev, struct vtpm_dev, dev); + + vtpm_dev_put(vtpm_dev); +} + +static void free_vtpm_dev(struct kref *kref) +{ + struct vtpm_dev *vtpm_dev = container_of(kref, struct vtpm_dev, kref); + + spin_lock(&driver_lock); + clear_bit(vtpm_dev->dev_num, dev_mask); + spin_unlock(&driver_lock); + + kfree(vtpm_dev); +} + +struct vtpm_dev *vtpm_create_vtpm_dev(struct platform_device **ppdev) +{ + struct vtpm_dev *vtpm_dev; + struct platform_device *pdev; + struct device *dev; + + vtpm_dev = kzalloc(sizeof(*vtpm_dev), GFP_KERNEL); + if (vtpm_dev == NULL) + return ERR_PTR(-ENOMEM); + + kref_init(&vtpm_dev->kref); + init_waitqueue_head(&vtpm_dev->wq); + spin_lock_init(&vtpm_dev->buf_lock); + + spin_lock(&driver_lock); + vtpm_dev->dev_num = find_first_zero_bit(dev_mask, VTPM_NUM_DEVICES); + + if (vtpm_dev->dev_num >= VTPM_NUM_DEVICES) { + spin_unlock(&driver_lock); + kfree(vtpm_dev); + return ERR_PTR(-ENOMEM); + } + + pdev = platform_device_register_simple("tpm_vtpm", vtpm_dev->dev_num, + NULL, 0); + if (IS_ERR(pdev)) { + spin_unlock(&driver_lock); + kfree(vtpm_dev); + return ERR_PTR(PTR_ERR(pdev)); + } + *ppdev = pdev; + + set_bit(vtpm_dev->dev_num, dev_mask); + spin_unlock(&driver_lock); + + dev = &pdev->dev; + + scnprintf(vtpm_dev->devname, sizeof(vtpm_dev->devname), + VTPM_DEV_PREFIX_SERVER"%d", vtpm_dev->dev_num); + + vtpm_dev->pdev = dev; + + dev_set_drvdata(dev, vtpm_dev); + + vtpm_dev->dev.class = vtpm_class; + vtpm_dev->dev.release = vtpm_dev_release; + vtpm_dev->dev.parent = vtpm_dev->pdev; + + vtpm_dev->dev.devt = MKDEV(MAJOR(vtpm_devt),vtpm_dev->dev_num); + + dev_set_name(&vtpm_dev->dev, "%s", vtpm_dev->devname); + + device_initialize(&vtpm_dev->dev); + + cdev_init(&vtpm_dev->cdev, &vtpm_fops); + vtpm_dev->cdev.owner = vtpm_dev->pdev->driver->owner; + vtpm_dev->cdev.kobj.parent = &vtpm_dev->dev.kobj; + + return vtpm_dev; +} + +/* + * Create a /dev/vtpms%d and /dev/vtpms%d device pair. + * + * Return value: + * Returns vtpm_dev pointer on success, an error value otherwise + */ +static struct vtpm_dev *vtpm_create_device_pair( + struct vtpm_new_pair *vtpm_new_pair) +{ + struct vtpm_dev *vtpm_dev; + struct platform_device *pdev = NULL; + int rc; + + vtpm_dev = vtpm_create_vtpm_dev(&pdev); + if (IS_ERR(vtpm_dev)) + return vtpm_dev; + + vtpm_dev->flags = vtpm_new_pair->flags; + + rc = device_add(&vtpm_dev->dev); + if (rc) { + kfree(vtpm_dev); + vtpm_dev = NULL; + goto err_device_add; + } + + rc = cdev_add(&vtpm_dev->cdev, vtpm_dev->dev.devt, 1); + if (rc) + goto err_cdev_add; + + vtpm_dev->chip = tpmm_chip_alloc_pattern(vtpm_dev->pdev, + &vtpm_tpm_ops, + VTPM_DEV_PREFIX_CLIENT"%d"); + if (IS_ERR(vtpm_dev->chip)) { + rc = PTR_ERR(vtpm_dev->chip); + goto err_chip_alloc; + } + + if (vtpm_dev->flags & VTPM_FLAG_TPM2) + vtpm_dev->chip->flags |= TPM_CHIP_FLAG_TPM2; + + vtpm_dev->chip->flags |= TPM_CHIP_FLAG_NO_SYSFS | TPM_CHIP_FLAG_NO_LOG; + + rc = tpm_chip_register(vtpm_dev->chip); + if (rc) + goto err_chip_register; + + spin_lock(&driver_lock); + list_add_rcu(&vtpm_dev->list, &vtpm_list); + spin_unlock(&driver_lock); + + vtpm_new_pair->tpm_dev_num = vtpm_dev->chip->dev_num; + vtpm_new_pair->vtpm_dev_num = vtpm_dev->dev_num; + + return vtpm_dev; + +err_chip_register: +err_chip_alloc: + cdev_del(&vtpm_dev->cdev); + +err_cdev_add: + device_unregister(&vtpm_dev->dev); + +err_device_add: + platform_device_unregister(pdev); + + return ERR_PTR(rc); +} + +/* + * Delete a /dev/vtpmc%d and /dev/vtpms%d device pair without checking + * whether it is still in use. + */ +static int _vtpm_delete_device_pair(struct vtpm_dev *vtpm_dev) +{ + struct device *dev = vtpm_dev->pdev; + struct platform_device *pdev = + container_of(dev, struct platform_device, dev); + + tpm_chip_unregister(vtpm_dev->chip); + + cdev_del(&vtpm_dev->cdev); + device_unregister(&vtpm_dev->dev); + + platform_device_unregister(pdev); + + spin_lock(&driver_lock); + list_del_rcu(&vtpm_dev->list); + spin_unlock(&driver_lock); + + return 0; +} + +/* + * Delete a /dev/vtpmc%d and /dev/vtpms%d device pair. + * + * Return value: + * Returns 0 on success, -EBUSY of the device pair is still in use + */ +static int vtpm_delete_device_pair(struct vtpm_dev *vtpm_dev) +{ + if (test_bit(STATE_OPENED_BIT, &vtpm_dev->state)) { + dev_dbg(vtpm_dev->pdev, "Device is busy\n"); + return -EBUSY; + } + + return _vtpm_delete_device_pair(vtpm_dev); +} + +/* + * Delete all /dev/tpm%d and /dev/vtpm%d device pairs. + * This function is only to be called when the module is removed and + * we are sure that there are no more users of any device. + */ +static int vtpm_delete_device_pairs(void) +{ + struct vtpm_dev *vtpm_dev; + int rc = 0; + + rcu_read_lock(); + + list_for_each_entry_rcu(vtpm_dev, &vtpm_list, list) { + rc = vtpm_delete_device_pair(vtpm_dev); + if (rc) + break; + } + + rcu_read_unlock(); + + return rc; +} + +static int vtpm_probe(struct platform_device *pdev) +{ + return 0; +} + +static struct platform_driver vtpm_drv = { + .probe = vtpm_probe, + .driver = { + .name = "tpm_vtpm", + }, +}; + +static int vtpm_init(void) +{ + return platform_driver_register(&vtpm_drv); +} + +/* + * Called for module removal; no more module users + */ +static void vtpm_cleanup(void) +{ + vtpm_delete_device_pairs(); + platform_driver_unregister(&vtpm_drv); +} + +/* + * Code related to /dev/vtpmx + */ + +struct vtpmx_dev { + struct device dev; + struct device *pdev; + struct cdev cdev; +}; + +/* + * vtpmx_fops_ioctl: ioctl on /dev/vtpmx + * + * Return value: + * Returns 0 on success, -EINVAL or -EFAULT otherwise. + */ +static long vtpmx_fops_ioctl(struct file *f, unsigned int ioctl, + unsigned long arg) +{ + void __user *argp = (void __user *)arg; + struct vtpm_new_pair *vtpm_new_pair_p; + struct vtpm_new_pair vtpm_new_pair; + struct vtpm_pair *vtpm_pair_p; + struct vtpm_pair vtpm_pair; + struct vtpm_dev *vtpm_dev; + int rc = 0; + + switch (ioctl) { + case VTPM_NEW_DEV: + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + vtpm_new_pair_p = argp; + if (copy_from_user(&vtpm_new_pair, vtpm_new_pair_p, + sizeof(vtpm_new_pair))) + return -EFAULT; + vtpm_dev = vtpm_create_device_pair(&vtpm_new_pair); + if (IS_ERR(vtpm_dev)) + return PTR_ERR(vtpm_dev); + if (copy_to_user(vtpm_new_pair_p, &vtpm_new_pair, + sizeof(vtpm_new_pair))) + return -EFAULT; + return 0; + + case VTPM_DEL_DEV: + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + vtpm_pair_p = argp; + if (copy_from_user(&vtpm_pair, vtpm_pair_p, sizeof(vtpm_pair))) + return -EFAULT; + + if (vtpm_pair.tpm_dev_num != VTPM_DEV_NUM_INVALID) { + vtpm_dev = + vtpm_dev_get_by_tpm_devnum(vtpm_pair.tpm_dev_num); + if (!vtpm_dev || vtpm_delete_device_pair(vtpm_dev) < 0) + rc = -EINVAL; + vtpm_dev_put(vtpm_dev); + return rc; + } + + if (vtpm_pair.vtpm_dev_num != VTPM_DEV_NUM_INVALID) { + vtpm_dev = + vtpm_dev_get_by_vtpm_devnum(vtpm_pair.vtpm_dev_num); + if (!vtpm_dev || vtpm_delete_device_pair(vtpm_dev) < 0) + rc = -EINVAL; + vtpm_dev_put(vtpm_dev); + return rc; + } + return -EINVAL; + + case VTPM_GET_VTPMDEV: + vtpm_pair_p = argp; + if (copy_from_user(&vtpm_pair, vtpm_pair_p, sizeof(vtpm_pair))) + return -EFAULT; + + if (vtpm_pair.tpm_dev_num != VTPM_DEV_NUM_INVALID) { + vtpm_dev = + vtpm_dev_get_by_tpm_devnum(vtpm_pair.tpm_dev_num); + if (!vtpm_dev) + return -EINVAL; + + vtpm_pair.vtpm_dev_num = vtpm_dev->dev_num; + + if (copy_to_user(vtpm_pair_p, &vtpm_pair, + sizeof(vtpm_pair))) + rc = -EFAULT; + vtpm_dev_put(vtpm_dev); + return rc; + } + return -EINVAL; + + case VTPM_GET_TPMDEV: + vtpm_pair_p = argp; + if (copy_from_user(&vtpm_pair, vtpm_pair_p, sizeof(vtpm_pair))) + return -EFAULT; + + if (vtpm_pair.vtpm_dev_num != VTPM_DEV_NUM_INVALID) { + vtpm_dev = + vtpm_dev_get_by_vtpm_devnum(vtpm_pair.vtpm_dev_num); + if (!vtpm_dev) + return -EINVAL; + + vtpm_pair.tpm_dev_num = vtpm_dev->chip->dev_num; + + if (copy_to_user(vtpm_pair_p, &vtpm_pair, + sizeof(vtpm_pair))) + rc = -EFAULT; + vtpm_dev_put(vtpm_dev); + return rc; + } + return -EINVAL; + + default: + return -EINVAL; + } +} + +#ifdef CONFIG_COMPAT +static long vtpmx_fops_compat_ioctl(struct file *f, unsigned int ioctl, + unsigned long arg) +{ + return vtpmx_fops_ioctl(f, ioctl, (unsigned long)compat_ptr(arg)); +} +#endif + +static const struct file_operations vtpmx_fops = { + .owner = THIS_MODULE, + .unlocked_ioctl = vtpmx_fops_ioctl, +#ifdef CONFIG_COMPAT + .compat_ioctl = vtpmx_fops_compat_ioctl, +#endif + .llseek = noop_llseek, +}; + +static struct miscdevice vtpmx_miscdev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "vtpmx", + .fops = &vtpmx_fops, +}; + +static int vtpmx_init(void) +{ + return misc_register(&vtpmx_miscdev); +} + +static void vtpmx_cleanup(void) +{ + misc_deregister(&vtpmx_miscdev); +} + +static int __init vtpm_module_init(void) +{ + int rc; + + vtpm_class = class_create(THIS_MODULE, "vtpm"); + if (IS_ERR(vtpm_class)) { + pr_err("couldn't create vtpm class\n"); + return PTR_ERR(vtpm_class); + } + + rc = alloc_chrdev_region(&vtpm_devt, 0, VTPM_NUM_DEVICES, "vtpm"); + if (rc < 0) { + pr_err("failed to allocate char dev region\n"); + goto err_alloc_reg; + } + + rc = vtpmx_init(); + if (rc) { + pr_err("couldn't create vtpmx device\n"); + goto err_vtpmx; + } + + rc = vtpm_init(); + if (rc) { + pr_err("couldn't init vtpm layer\n"); + goto err_vtpm; + } + + return 0; + +err_vtpm: + vtpmx_cleanup(); + +err_vtpmx: + unregister_chrdev_region(vtpm_devt, VTPM_NUM_DEVICES); + +err_alloc_reg: + class_destroy(vtpm_class); + + return rc; +} + +static void __exit vtpm_module_exit(void) +{ + vtpm_cleanup(); + vtpmx_cleanup(); + unregister_chrdev_region(vtpm_devt, VTPM_NUM_DEVICES); + class_destroy(vtpm_class); +} + +subsys_initcall(vtpm_module_init); +module_exit(vtpm_module_exit); + +MODULE_AUTHOR("Stefan Berger (stefanb@us.ibm.com)"); +MODULE_DESCRIPTION("vTPM Driver"); +MODULE_VERSION("0.1"); +MODULE_LICENSE("GPL"); diff --git a/drivers/char/tpm/tpm-vtpm.h b/drivers/char/tpm/tpm-vtpm.h new file mode 100644 index 0000000..8649867 --- /dev/null +++ b/drivers/char/tpm/tpm-vtpm.h @@ -0,0 +1,58 @@ +/* + * Copyright (C) 2015 IBM Corporation + * + * Author: Stefan Berger <stefanb@us.ibm.com> + * + * Maintained by: <tpmdd-devel@lists.sourceforge.net> + * + * Device driver for vTPM. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation, version 2 of the + * License. + * + */ + +#ifndef __TPM_VTPM_H +#define __TPM_VTPM_H + +#include "tpm.h" + +#define VTPM_NUM_DEVICES TPM_NUM_DEVICES + +struct vtpm_dev { + struct kref kref; + + struct device *pdev; + struct device dev; + struct cdev cdev; + + struct tpm_chip *chip; + + u32 flags; + + int dev_num; + char devname[VTPM_DEVNAME_MAX]; + + long state; +#define STATE_OPENED_BIT 0 + + spinlock_t buf_lock; /* lock for buffers */ + + wait_queue_head_t wq; + + size_t req_len; /* length of queued TPM request */ + u8 req_buf[TPM_BUFSIZE]; /* request buffer */ + + size_t resp_len; /* length of queued TPM response */ + u8 resp_buf[TPM_BUFSIZE]; /* request buffer */ + + struct list_head list; +}; + +struct file_priv { + struct vtpm_dev *vtpm_dev; +}; + +#endif diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild index c2e5d6c..c194d61 100644 --- a/include/uapi/linux/Kbuild +++ b/include/uapi/linux/Kbuild @@ -449,6 +449,7 @@ header-y += virtio_scsi.h header-y += virtio_types.h header-y += vm_sockets.h header-y += vt.h +header-y += vtpm.h header-y += wait.h header-y += wanrouter.h header-y += watchdog.h diff --git a/include/uapi/linux/vtpm.h b/include/uapi/linux/vtpm.h new file mode 100644 index 0000000..460282b --- /dev/null +++ b/include/uapi/linux/vtpm.h @@ -0,0 +1,52 @@ +/* + * Definitions for the VTPM interface + * Copyright (c) 2015, IBM Corporation + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + */ + +#ifndef _UAPI_LINUX_VTPM_H +#define _UAPI_LINUX_VTPM_H + +#include <linux/types.h> +#include <linux/ioctl.h> + +/* ioctls */ +#define VTPM_TPM 0xa0 + +#define VTPM_DEVNAME_MAX 16 + +#define VTPM_DEV_PREFIX_SERVER "vtpms" /* server-side device name prefix */ +#define VTPM_DEV_PREFIX_CLIENT "vtpmc" /* client-side device name prefix */ + +#define VTPM_DEV_NUM_INVALID ~0 + +struct vtpm_new_pair { + __u32 flags; /* input */ + __u32 tpm_dev_num; /* output */ + __u32 vtpm_dev_num; /* output */ +}; + +struct vtpm_pair { + __u32 tpm_dev_num; /* input or output */ + __u32 vtpm_dev_num; /* input or output */ +}; + +/* above flags */ +#define VTPM_FLAG_TPM2 1 /* choose a TPM2; mainly for sysfs entries */ +#define VTPM_FLAG_KEEP_DEVPAIR 2 /* keep the device pair once vTPM closes */ + +/* create new TPM device pair */ +#define VTPM_NEW_DEV _IOW(VTPM_TPM, 0x00, struct vtpm_new_pair) +#define VTPM_DEL_DEV _IOW(VTPM_TPM, 0x01, struct vtpm_pair) +#define VTPM_GET_TPMDEV _IOW(VTPM_TPM, 0x02, struct vtpm_pair) +#define VTPM_GET_VTPMDEV _IOW(VTPM_TPM, 0x03, struct vtpm_pair) + +#endif /* _UAPI_LINUX_VTPM_H */
From: Stefan Berger <stefanb@linux.vnet.ibm.com> This patch implements a driver for supporting multiple emulated TPMs in a system. An emulated TPM can for example be used by a Linux container or later on by namespace-enabled IMA running inside a container. The driver implements a device /dev/vtpmx that is only used to created device pairs /dev/vtpmcX (e.g., /dev/vtpmc10) and /dev/vtpmsY (e.g., /dev/vtpms11) using ioctls. The device /dev/vtpmcX is the usual TPM device created by the core TPM driver. Applications or kernel subsystems can send TPM commands to it and the corresponding /dev/vtpmY receives these commands and delivers them to an emulated TPM. The device /dev/vtpmcX can be moved into a container and appear there as /dev/tpm0. Life-cycle management is implemented by ioctls of /dev/vtpmx for creating and destroying them. The ioctl for creating a device pair returns the names of the created devices so that applications know which devices to use. Other ioctls help to find the corresponding 'other' device given the device name of one. Signed-off-by; Stefan Berger <stefanb@linux.vnet.ibm.com> --- drivers/char/tpm/Kconfig | 10 + drivers/char/tpm/Makefile | 1 + drivers/char/tpm/tpm-vtpm.c | 855 ++++++++++++++++++++++++++++++++++++++++++++ drivers/char/tpm/tpm-vtpm.h | 58 +++ include/uapi/linux/Kbuild | 1 + include/uapi/linux/vtpm.h | 52 +++ 6 files changed, 977 insertions(+) create mode 100644 drivers/char/tpm/tpm-vtpm.c create mode 100644 drivers/char/tpm/tpm-vtpm.h create mode 100644 include/uapi/linux/vtpm.h