Message ID | 20200710002209.6757-1-apronin@chromium.org (mailing list archive) |
---|---|
State | Rejected |
Delegated to: | Jarkko Sakkinen |
Headers | show |
Series | tpm: avoid accessing cleared ops during shutdown | expand |
On Thu, Jul 09, 2020 at 05:22:09PM -0700, Andrey Pronin wrote: > This patch prevents NULL dereferencing when using chip->ops while > sending TPM2_Shutdown command if both tpm_class_shutdown handler and > tpm_del_char_device are called during system shutdown. > > Both these handlers set chip->ops to NULL but don't check if it's > already NULL when they are called before using it. > > This issue was revealed in Chrome OS after a recent set of changes > to the unregister order for spi controllers, such as: > b4c6230bb0ba spi: Fix controller unregister order > f40913d2dca1 spi: pxa2xx: Fix controller unregister order > and similar for other controllers. I'm not sure I fully understand the scenario. When does thi happen? Why does not tpm_del_char_device need this? The changes listed tell me nothing. Why they have this effect? I'm just trying to understand whether this could be a regression or not. I neither understand what you mean by "and similar for other controllers." NAK for the reason that I don't understand what I'm merging. /Jarkko > > Signed-off-by: Andrey Pronin <apronin@chromium.org> > --- > drivers/char/tpm/tpm-chip.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c > index 8c77e88012e9..a410ca40a3c5 100644 > --- a/drivers/char/tpm/tpm-chip.c > +++ b/drivers/char/tpm/tpm-chip.c > @@ -296,7 +296,7 @@ static int tpm_class_shutdown(struct device *dev) > struct tpm_chip *chip = container_of(dev, struct tpm_chip, dev); > > down_write(&chip->ops_sem); > - if (chip->flags & TPM_CHIP_FLAG_TPM2) { > + if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) { > if (!tpm_chip_start(chip)) { > tpm2_shutdown(chip, TPM2_SU_CLEAR); > tpm_chip_stop(chip); > @@ -479,7 +479,7 @@ static void tpm_del_char_device(struct tpm_chip *chip) > > /* Make the driver uncallable. */ > down_write(&chip->ops_sem); > - if (chip->flags & TPM_CHIP_FLAG_TPM2) { > + if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) { > if (!tpm_chip_start(chip)) { > tpm2_shutdown(chip, TPM2_SU_CLEAR); > tpm_chip_stop(chip); > -- > 2.25.1 >
On Fri, Jul 10, 2020 at 4:40 AM Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: > > On Thu, Jul 09, 2020 at 05:22:09PM -0700, Andrey Pronin wrote: > > This patch prevents NULL dereferencing when using chip->ops while > > sending TPM2_Shutdown command if both tpm_class_shutdown handler and > > tpm_del_char_device are called during system shutdown. > > > > Both these handlers set chip->ops to NULL but don't check if it's > > already NULL when they are called before using it. > > > > This issue was revealed in Chrome OS after a recent set of changes > > to the unregister order for spi controllers, such as: > > b4c6230bb0ba spi: Fix controller unregister order > > f40913d2dca1 spi: pxa2xx: Fix controller unregister order > > and similar for other controllers. > > I'm not sure I fully understand the scenario. When does thi happen? This happens during system shutdown. Here a sample stack trace from the panic: BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 ... Call Trace: tpm_transmit_cmd+0x21/0x7f tpm2_shutdown+0x84/0xc6 tpm_chip_unregister+0xa2/0xb9 cr50_spi_remove+0x19/0x26 spi_drv_remove+0x2a/0x42 device_release_driver_internal+0x123/0x1ec bus_remove_device+0xe8/0x111 device_del+0x1bf/0x319 ? spi_unregister_controller+0xfc/0xfc device_unregister+0x12/0x28 __unregister+0xe/0x12 device_for_each_child+0x79/0xb8 spi_unregister_controller+0x27/0xfc pxa2xx_spi_remove+0x45/0xb4 device_shutdown+0x181/0x1d3 kernel_restart+0x12/0x56 SyS_reboot+0x16a/0x1e7 do_syscall_64+0x6b/0xf7 entry_SYSCALL_64_after_hwframe+0x41/0xa6 > Why does not tpm_del_char_device need this? "Not" is a typo in the sentence above, right? tpm_del_char_device *does* need the fix. When tpm_class_shutdown is called it sets chip->ops to NULL. If tpm_del_char_device is called after that, it doesn't check if chip->ops is NULL (normal kernel API and char device API calls go through tpm_try_get_ops, but tpm_del_char_device doesn't) and proceeds to call tpm2_shutdown(), which tries sending the command and dereferences chip->ops. > The changes listed tell > me nothing. Why they have this effect? "spi: pxa2xx: Fix controller unregister order" adds a spi_unregister_controller call to pxa2xx_spi_remove, which internally calls tpm_del_char_device, which results in the stack trace leading to the panic above. > > I'm just trying to understand whether this could be a regression or > not. > > I neither understand what you mean by "and similar for other > controllers." There was a series of spi unregister order changes for various spi controllers, including the following: 1c6221b430a0 spi: bcm2835: Fix controller unregister order f40913d2dca1 spi: pxa2xx: Fix controller unregister order b4c6230bb0ba spi: Fix controller unregister order 54000d2e15e9 spi: dw: Fix controller unregister order c8f309db490e spi: bcm2835aux: Fix controller unregister order > > NAK for the reason that I don't understand what I'm merging. > > /Jarkko > > > > > Signed-off-by: Andrey Pronin <apronin@chromium.org> > > --- > > drivers/char/tpm/tpm-chip.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c > > index 8c77e88012e9..a410ca40a3c5 100644 > > --- a/drivers/char/tpm/tpm-chip.c > > +++ b/drivers/char/tpm/tpm-chip.c > > @@ -296,7 +296,7 @@ static int tpm_class_shutdown(struct device *dev) > > struct tpm_chip *chip = container_of(dev, struct tpm_chip, dev); > > > > down_write(&chip->ops_sem); > > - if (chip->flags & TPM_CHIP_FLAG_TPM2) { > > + if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) { > > if (!tpm_chip_start(chip)) { > > tpm2_shutdown(chip, TPM2_SU_CLEAR); > > tpm_chip_stop(chip); > > @@ -479,7 +479,7 @@ static void tpm_del_char_device(struct tpm_chip *chip) > > > > /* Make the driver uncallable. */ > > down_write(&chip->ops_sem); > > - if (chip->flags & TPM_CHIP_FLAG_TPM2) { > > + if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) { > > if (!tpm_chip_start(chip)) { > > tpm2_shutdown(chip, TPM2_SU_CLEAR); > > tpm_chip_stop(chip); > > -- > > 2.25.1 > >
On Thu, 2020-07-09 at 17:22 -0700, Andrey Pronin wrote: > This patch prevents NULL dereferencing when using chip->ops while > sending TPM2_Shutdown command if both tpm_class_shutdown handler and > tpm_del_char_device are called during system shutdown. > > Both these handlers set chip->ops to NULL but don't check if it's > already NULL when they are called before using it. > > This issue was revealed in Chrome OS after a recent set of changes > to the unregister order for spi controllers, such as: > b4c6230bb0ba spi: Fix controller unregister order > f40913d2dca1 spi: pxa2xx: Fix controller unregister order > and similar for other controllers. > > Signed-off-by: Andrey Pronin <apronin@chromium.org> > --- > drivers/char/tpm/tpm-chip.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm- > chip.c > index 8c77e88012e9..a410ca40a3c5 100644 > --- a/drivers/char/tpm/tpm-chip.c > +++ b/drivers/char/tpm/tpm-chip.c > @@ -296,7 +296,7 @@ static int tpm_class_shutdown(struct device *dev) > struct tpm_chip *chip = container_of(dev, struct tpm_chip, > dev); > > down_write(&chip->ops_sem); > - if (chip->flags & TPM_CHIP_FLAG_TPM2) { > + if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) { > if (!tpm_chip_start(chip)) { > tpm2_shutdown(chip, TPM2_SU_CLEAR); > tpm_chip_stop(chip); > @@ -479,7 +479,7 @@ static void tpm_del_char_device(struct tpm_chip > *chip) > > /* Make the driver uncallable. */ > down_write(&chip->ops_sem); > - if (chip->flags & TPM_CHIP_FLAG_TPM2) { > + if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) { > if (!tpm_chip_start(chip)) { > tpm2_shutdown(chip, TPM2_SU_CLEAR); > tpm_chip_stop(chip); I really don't think this is the right fix. The problem is that these two functions are trying to open code tpm_try_get_ops/tpm_put_ops (only really for the tpm2 shutdown) because they want to NULL out the ops before final mutex unlock. The problem with the current open coding is it doesn't shut down the clock if required (not really a problem for shutdown, but might cause issues for simple rmmod). I think this is because no-one noticed the open coding when get/put was updated. This code should all be abstracted into a single function and shared with tpm_try_get_ops/tpm_put_ops so we can't have this happen in future. Possibly there should be a chip shutdown function which is only active for TPM2 which does the correct try_get/shutdown/put operation and then a separate simple get mutex, null ops, put mutex one that's guaranteed to be called last. James
On Fri, Jul 10, 2020 at 12:08 PM James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > > On Thu, 2020-07-09 at 17:22 -0700, Andrey Pronin wrote: > > This patch prevents NULL dereferencing when using chip->ops while > > sending TPM2_Shutdown command if both tpm_class_shutdown handler and > > tpm_del_char_device are called during system shutdown. > > > > Both these handlers set chip->ops to NULL but don't check if it's > > already NULL when they are called before using it. > > > > This issue was revealed in Chrome OS after a recent set of changes > > to the unregister order for spi controllers, such as: > > b4c6230bb0ba spi: Fix controller unregister order > > f40913d2dca1 spi: pxa2xx: Fix controller unregister order > > and similar for other controllers. > > > > Signed-off-by: Andrey Pronin <apronin@chromium.org> > > --- > > drivers/char/tpm/tpm-chip.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm- > > chip.c > > index 8c77e88012e9..a410ca40a3c5 100644 > > --- a/drivers/char/tpm/tpm-chip.c > > +++ b/drivers/char/tpm/tpm-chip.c > > @@ -296,7 +296,7 @@ static int tpm_class_shutdown(struct device *dev) > > struct tpm_chip *chip = container_of(dev, struct tpm_chip, > > dev); > > > > down_write(&chip->ops_sem); > > - if (chip->flags & TPM_CHIP_FLAG_TPM2) { > > + if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) { > > if (!tpm_chip_start(chip)) { > > tpm2_shutdown(chip, TPM2_SU_CLEAR); > > tpm_chip_stop(chip); > > @@ -479,7 +479,7 @@ static void tpm_del_char_device(struct tpm_chip > > *chip) > > > > /* Make the driver uncallable. */ > > down_write(&chip->ops_sem); > > - if (chip->flags & TPM_CHIP_FLAG_TPM2) { > > + if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) { > > if (!tpm_chip_start(chip)) { > > tpm2_shutdown(chip, TPM2_SU_CLEAR); > > tpm_chip_stop(chip); > > I really don't think this is the right fix. The problem is that these > two functions are trying to open code tpm_try_get_ops/tpm_put_ops (only > really for the tpm2 shutdown) because they want to NULL out the ops > before final mutex unlock. The problem with the current open coding is > it doesn't shut down the clock if required (not really a problem for > shutdown, but might cause issues for simple rmmod). I think this is > because no-one noticed the open coding when get/put was updated. > > This code should all be abstracted into a single function and shared > with tpm_try_get_ops/tpm_put_ops so we can't have this happen in > future. Possibly there should be a chip shutdown function which is > only active for TPM2 which does the correct try_get/shutdown/put > operation and then a separate simple get mutex, null ops, put mutex one > that's guaranteed to be called last. Yes, went for a minimal patch here to stop kernel panics, didn't try to refactor. Note that we do hold chip->ops_sem in both cases, and it's a write-lock, not a read-lock (as tpm_try_get_ops uses) since we are changing chip->ops. Thanks to this write-lock there, shouldn't be parallel operations that use chip->ops (so not locking chip->tpm_mutex shouldn't affect it). So, if I understand the idea right, can refactor to something like: 1) extract common code between tpm_del_char_device and tpm_class_shutdown into a shared method; 2) further extract the part between up/down(chip->ops_sem) to be re-used between tpm_try_get_ops/tpm_put_ops and this flow; 3) still have down_write/up_write in this flow vs get/put_device + down_read/up_read in tpm_try_get_ops case. Please let me know if that's a bad idea. Will be unavailable next week, but will continue after that. > > James >
On Fri, Jul 10, 2020 at 11:25:44AM -0700, Andrey Pronin wrote: > > Why does not tpm_del_char_device need this? > > "Not" is a typo in the sentence above, right? tpm_del_char_device *does* > need the fix. When tpm_class_shutdown is called it sets chip->ops to > NULL. If tpm_del_char_device is called after that, it doesn't check if > chip->ops is NULL (normal kernel API and char device API calls go > through tpm_try_get_ops, but tpm_del_char_device doesn't) and proceeds to > call tpm2_shutdown(), which tries sending the command and dereferences > chip->ops. It's a typo, yes. Sorry about that. tpm_class_shutdown() is essentially tail of tpm_del_char_device(). To clean things up, I'd suggest dropping tpm_del_char_device() and call tpm_class_shutdown() in tpm_chip_unregisters() along, and open coding things that prepend it in tpm_del_char_device(). /Jarkko
On Tue, Jul 14, 2020 at 4:32 AM Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: > > On Fri, Jul 10, 2020 at 11:25:44AM -0700, Andrey Pronin wrote: > > > Why does not tpm_del_char_device need this? > > > > "Not" is a typo in the sentence above, right? tpm_del_char_device *does* > > need the fix. When tpm_class_shutdown is called it sets chip->ops to > > NULL. If tpm_del_char_device is called after that, it doesn't check if > > chip->ops is NULL (normal kernel API and char device API calls go > > through tpm_try_get_ops, but tpm_del_char_device doesn't) and proceeds to > > call tpm2_shutdown(), which tries sending the command and dereferences > > chip->ops. > > It's a typo, yes. Sorry about that. > > tpm_class_shutdown() is essentially tail of tpm_del_char_device(). > > To clean things up, I'd suggest dropping tpm_del_char_device() and > call tpm_class_shutdown() in tpm_chip_unregisters() along, and open > coding things that prepend it in tpm_del_char_device(). > Personally I would have preferred two separate patches, one to fix the immediate problem (with Cc: stable) and one for the cleanup, but I guess merging both into one is ok as long as it is marked for stable. Thanks, Guenter
On Tue, Jul 14, 2020 at 08:48:38AM -0700, Guenter Roeck wrote: > On Tue, Jul 14, 2020 at 4:32 AM Jarkko Sakkinen > <jarkko.sakkinen@linux.intel.com> wrote: > > > > On Fri, Jul 10, 2020 at 11:25:44AM -0700, Andrey Pronin wrote: > > > > Why does not tpm_del_char_device need this? > > > > > > "Not" is a typo in the sentence above, right? tpm_del_char_device *does* > > > need the fix. When tpm_class_shutdown is called it sets chip->ops to > > > NULL. If tpm_del_char_device is called after that, it doesn't check if > > > chip->ops is NULL (normal kernel API and char device API calls go > > > through tpm_try_get_ops, but tpm_del_char_device doesn't) and proceeds to > > > call tpm2_shutdown(), which tries sending the command and dereferences > > > chip->ops. > > > > It's a typo, yes. Sorry about that. > > > > tpm_class_shutdown() is essentially tail of tpm_del_char_device(). > > > > To clean things up, I'd suggest dropping tpm_del_char_device() and > > call tpm_class_shutdown() in tpm_chip_unregisters() along, and open > > coding things that prepend it in tpm_del_char_device(). > > > > Personally I would have preferred two separate patches, one to fix the > immediate problem (with Cc: stable) and one for the cleanup, but I > guess merging both into one is ok as long as it is marked for stable. > > Thanks, > Guenter Not sure about stable as this issue does not afaik concern earlier kernel versions? /Jarkko
On Thu, Jul 16, 2020 at 10:28 AM Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> wrote: > > On Tue, Jul 14, 2020 at 08:48:38AM -0700, Guenter Roeck wrote: > > On Tue, Jul 14, 2020 at 4:32 AM Jarkko Sakkinen > > <jarkko.sakkinen@linux.intel.com> wrote: > > > > > > On Fri, Jul 10, 2020 at 11:25:44AM -0700, Andrey Pronin wrote: > > > > > Why does not tpm_del_char_device need this? > > > > > > > > "Not" is a typo in the sentence above, right? tpm_del_char_device *does* > > > > need the fix. When tpm_class_shutdown is called it sets chip->ops to > > > > NULL. If tpm_del_char_device is called after that, it doesn't check if > > > > chip->ops is NULL (normal kernel API and char device API calls go > > > > through tpm_try_get_ops, but tpm_del_char_device doesn't) and proceeds to > > > > call tpm2_shutdown(), which tries sending the command and dereferences > > > > chip->ops. > > > > > > It's a typo, yes. Sorry about that. > > > > > > tpm_class_shutdown() is essentially tail of tpm_del_char_device(). > > > > > > To clean things up, I'd suggest dropping tpm_del_char_device() and > > > call tpm_class_shutdown() in tpm_chip_unregisters() along, and open > > > coding things that prepend it in tpm_del_char_device(). > > > > > > > Personally I would have preferred two separate patches, one to fix the > > immediate problem (with Cc: stable) and one for the cleanup, but I > > guess merging both into one is ok as long as it is marked for stable. > > > > Thanks, > > Guenter > > Not sure about stable as this issue does not afaik concern earlier > kernel versions? > I just had a quick look into linux-5.4.y, and it seemed to me that it is affected. Maybe I am wrong. Either case, we already applied this patch to all affected ChromeOS kernel branches, so from our perspective it doesn't really matter. Thanks, Guenter
On Thu, Jul 16, 2020 at 10:38:00AM -0700, Guenter Roeck wrote: > On Thu, Jul 16, 2020 at 10:28 AM Jarkko Sakkinen > <jarkko.sakkinen@linux.intel.com> wrote: > > > > On Tue, Jul 14, 2020 at 08:48:38AM -0700, Guenter Roeck wrote: > > > On Tue, Jul 14, 2020 at 4:32 AM Jarkko Sakkinen > > > <jarkko.sakkinen@linux.intel.com> wrote: > > > > > > > > On Fri, Jul 10, 2020 at 11:25:44AM -0700, Andrey Pronin wrote: > > > > > > Why does not tpm_del_char_device need this? > > > > > > > > > > "Not" is a typo in the sentence above, right? tpm_del_char_device *does* > > > > > need the fix. When tpm_class_shutdown is called it sets chip->ops to > > > > > NULL. If tpm_del_char_device is called after that, it doesn't check if > > > > > chip->ops is NULL (normal kernel API and char device API calls go > > > > > through tpm_try_get_ops, but tpm_del_char_device doesn't) and proceeds to > > > > > call tpm2_shutdown(), which tries sending the command and dereferences > > > > > chip->ops. > > > > > > > > It's a typo, yes. Sorry about that. > > > > > > > > tpm_class_shutdown() is essentially tail of tpm_del_char_device(). > > > > > > > > To clean things up, I'd suggest dropping tpm_del_char_device() and > > > > call tpm_class_shutdown() in tpm_chip_unregisters() along, and open > > > > coding things that prepend it in tpm_del_char_device(). > > > > > > > > > > Personally I would have preferred two separate patches, one to fix the > > > immediate problem (with Cc: stable) and one for the cleanup, but I > > > guess merging both into one is ok as long as it is marked for stable. > > > > > > Thanks, > > > Guenter > > > > Not sure about stable as this issue does not afaik concern earlier > > kernel versions? > > > > I just had a quick look into linux-5.4.y, and it seemed to me that it > is affected. Maybe I am wrong. Either case, we already applied this > patch to all affected ChromeOS kernel branches, so from our > perspective it doesn't really matter. > > Thanks, > Guenter I'm fine with cc'ing stable after consideration given the benefits. Given that conclusion, it is better to break this down to two part series as you proposed. /Jarkko
diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c index 8c77e88012e9..a410ca40a3c5 100644 --- a/drivers/char/tpm/tpm-chip.c +++ b/drivers/char/tpm/tpm-chip.c @@ -296,7 +296,7 @@ static int tpm_class_shutdown(struct device *dev) struct tpm_chip *chip = container_of(dev, struct tpm_chip, dev); down_write(&chip->ops_sem); - if (chip->flags & TPM_CHIP_FLAG_TPM2) { + if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) { if (!tpm_chip_start(chip)) { tpm2_shutdown(chip, TPM2_SU_CLEAR); tpm_chip_stop(chip); @@ -479,7 +479,7 @@ static void tpm_del_char_device(struct tpm_chip *chip) /* Make the driver uncallable. */ down_write(&chip->ops_sem); - if (chip->flags & TPM_CHIP_FLAG_TPM2) { + if (chip->ops && (chip->flags & TPM_CHIP_FLAG_TPM2)) { if (!tpm_chip_start(chip)) { tpm2_shutdown(chip, TPM2_SU_CLEAR); tpm_chip_stop(chip);
This patch prevents NULL dereferencing when using chip->ops while sending TPM2_Shutdown command if both tpm_class_shutdown handler and tpm_del_char_device are called during system shutdown. Both these handlers set chip->ops to NULL but don't check if it's already NULL when they are called before using it. This issue was revealed in Chrome OS after a recent set of changes to the unregister order for spi controllers, such as: b4c6230bb0ba spi: Fix controller unregister order f40913d2dca1 spi: pxa2xx: Fix controller unregister order and similar for other controllers. Signed-off-by: Andrey Pronin <apronin@chromium.org> --- drivers/char/tpm/tpm-chip.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)