Message ID | 20190108205043.3122-1-logang@deltatee.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | scsi: isci: initialize shost fully before calling scsi_add_host() | expand |
Logan Gunthorpe <logang@deltatee.com> writes: > scsi_mq_setup_tags(), which is called by scsi_add_host(), calculates > the command size to allocate based on the prot_capabilities. In the > isci driver, scsi_host_set_prot() is called after scsi_add_host() > so the command size gets calculated to be smaller than it needs to be. > Eventually, scsi_mq_init_request() locates the 'prot_sdb' after the command > assuming it was sized correctly and a buffer overrun may occur. [...] > To prevent this, the calls to scsi_host_set_prot() are moved into > isci_host_alloc() before the call to scsi_add_host(). Out of caution, > also move the similar call to scsi_host_set_guard(). > > Fixes: 3d2d75254915 ("[SCSI] isci: T10 DIF support") > Link: http://lkml.kernel.org/r/da851333-eadd-163a-8c78-e1f4ec5ec857@deltatee.com > Signed-off-by: Logan Gunthorpe <logang@deltatee.com> > Cc: Intel SCU Linux support <intel-linux-scu@intel.com> > Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com> > Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> > Cc: "Martin K. Petersen" <martin.petersen@oracle.com> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Jens Axboe <axboe@kernel.dk> > Cc: Jeff Moyer <jmoyer@redhat.com> Nice job, and excellent commit message. We'll need a similar patch for lpfc. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> > --- > drivers/scsi/isci/init.c | 14 +++++++------- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/drivers/scsi/isci/init.c b/drivers/scsi/isci/init.c > index 68b90c4f79a3..1727d0c71b12 100644 > --- a/drivers/scsi/isci/init.c > +++ b/drivers/scsi/isci/init.c > @@ -576,6 +576,13 @@ static struct isci_host *isci_host_alloc(struct pci_dev *pdev, int id) > shost->max_lun = ~0; > shost->max_cmd_len = MAX_COMMAND_SIZE; > > + /* turn on DIF support */ > + scsi_host_set_prot(shost, > + SHOST_DIF_TYPE1_PROTECTION | > + SHOST_DIF_TYPE2_PROTECTION | > + SHOST_DIF_TYPE3_PROTECTION); > + scsi_host_set_guard(shost, SHOST_DIX_GUARD_CRC); > + > err = scsi_add_host(shost, &pdev->dev); > if (err) > goto err_shost; > @@ -663,13 +670,6 @@ static int isci_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) > goto err_host_alloc; > } > pci_info->hosts[i] = h; > - > - /* turn on DIF support */ > - scsi_host_set_prot(to_shost(h), > - SHOST_DIF_TYPE1_PROTECTION | > - SHOST_DIF_TYPE2_PROTECTION | > - SHOST_DIF_TYPE3_PROTECTION); > - scsi_host_set_guard(to_shost(h), SHOST_DIX_GUARD_CRC); > } > > err = isci_setup_interrupts(pdev);
On 1/8/19 1:50 PM, Logan Gunthorpe wrote: > scsi_mq_setup_tags(), which is called by scsi_add_host(), calculates > the command size to allocate based on the prot_capabilities. In the > isci driver, scsi_host_set_prot() is called after scsi_add_host() > so the command size gets calculated to be smaller than it needs to be. > Eventually, scsi_mq_init_request() locates the 'prot_sdb' after the command > assuming it was sized correctly and a buffer overrun may occur. > > However, seeing blk_mq_alloc_rqs() rounds up to the nearest cache line > size, the mistake can go unnoticed. > > The bug was noticed after the struct request size was reduced by > commit 9d037ad707ed ("block: remove req->timeout_list") > > Which likely reduced the allocated space for the request by an entire > cache line, enough that the overflow could be hit and it caused a panic, > on boot, at: > > RIP: 0010:t10_pi_complete+0x77/0x1c0 > Call Trace: > <IRQ> > sd_done+0xf5/0x340 > scsi_finish_command+0xc3/0x120 > blk_done_softirq+0x83/0xb0 > __do_softirq+0xa1/0x2e6 > irq_exit+0xbc/0xd0 > call_function_single_interrupt+0xf/0x20 > </IRQ> > > sd_done() would call scsi_prot_sg_count() which reads the number of > entities in 'prot_sdb', but seeing 'prot_sdb' is located after the end of > the allocated space it reads a garbage number and erroneously calls > t10_pi_complete(). > > To prevent this, the calls to scsi_host_set_prot() are moved into > isci_host_alloc() before the call to scsi_add_host(). Out of caution, > also move the similar call to scsi_host_set_guard(). Nice work! Reviewed-by: Jens Axboe <axboe@kernel.dk>
Logan, > To prevent this, the calls to scsi_host_set_prot() are moved into > isci_host_alloc() before the call to scsi_add_host(). Out of caution, > also move the similar call to scsi_host_set_guard(). Applied to 5.0/scsi-fixes. Thanks much!
This looks good. I wonder if there is any good way to prevent other drivers from picking up this bug byt using a better interface, but that should not delay your fix.
On 09/01/2019 18:41, Christoph Hellwig wrote: > This looks good. I wonder if there is any good way to prevent other > drivers from picking up this bug byt using a better interface, but > that should not delay your fix. > > . > I noticed that hisi_sas has this same problem but I forgot to fix it. So how about just drop these APIs and let the user set the shost protection parameters directly, like other shost parameters, which should make it a bit clearer when these should be set, i.e. before scsi_add_host()? John
John, > So how about just drop these APIs and let the user set the shost > protection parameters directly, like other shost parameters, The protection interfaces here obviously predate the block layer allocation changes that made this particular issue pop up. > which should make it a bit clearer when these should be set, > i.e. before scsi_add_host()? In general, I am not so keen on the somewhat messy intersection between the host parameters and the host template. The static host templates made lots of sense in the days of Seagate ST01 and fixed hardware capabilities. But reality is that most modern controllers have to query firmware interfaces to figure out what their actual capabilities are. So in this case I think that accessor functions are actually better because they allow us to print a big fat warning when you twiddle something you shouldn't post-initialization. So that's something I think we could--and should--improve.
On 12/01/2019 02:34, Martin K. Petersen wrote: > > John, > >> So how about just drop these APIs and let the user set the shost >> protection parameters directly, like other shost parameters, > > The protection interfaces here obviously predate the block layer > allocation changes that made this particular issue pop up. > >> which should make it a bit clearer when these should be set, >> i.e. before scsi_add_host()? > > In general, I am not so keen on the somewhat messy intersection between > the host parameters and the host template. The static host templates > made lots of sense in the days of Seagate ST01 and fixed hardware > capabilities. But reality is that most modern controllers have to query > firmware interfaces to figure out what their actual capabilities are. Hi Martin, I am not suggested setting the parameters via scsi host template, but rather dynamically (as we currently do) but just drop the set helper functions, like: shost->max_channel = 1; shost->max_cmd_len = 16; ... if (hisi_hba->prot_mask) { dev_info(dev, "Registering for DIF/DIX prot_mask=0x%x\n", prot_mask); - scsi_host_set_prot(hisi_hba->shost, prot_mask); + shost->prot_capabilities = prot_mask; } rc = scsi_add_host(shost, dev); if (rc) goto err_out_ha; rc = sas_register_ha(sha); if (rc) goto err_out_register_ha; I find that it is not crystal clear when scsi_host_set_prot() and scsi_host_set_guard() should be called, but not so for setting the shost parameters directly, which is common. > > So in this case I think that accessor functions are actually better > because they allow us to print a big fat warning when you twiddle > something you shouldn't post-initialization. So that's something I think > we could--and should--improve. > Sure, this is an alternative, but I would rather make it obvious when these parameters should be set so that this would not be required. Thanks, John
Hi John, >> So in this case I think that accessor functions are actually better >> because they allow us to print a big fat warning when you twiddle >> something you shouldn't post-initialization. So that's something I think >> we could--and should--improve. >> > Sure, this is an alternative, but I would rather make it obvious when > these parameters should be set so that this would not be required. I would like to have a mechanism in place that warns if you twiddle things that break assumptions made at host registration time. That's not a scenario the old registration interface was designed to handle. I am not sure I agree with your assertion that setting these masks in the struct prior to scsi_add_host() makes this ordering requirement much more obvious. It is not like you are passing in a list of parameters and then receiving a separately instantiated immutable scsi_host object. You are performing an operation on something you already have and own. That's why I commented that the current intersection between compile-time static host template, dynamically discovered pre-registration scsi_host parameters, and the registered runtime scsi_host struct is somewhat messy. Btw. I have no attachment to the prot wrappers whatsoever. The reason they exist is that the SCSI integrity support was optional. And therefore we had stub functions so things could be compiled without any of the integrity fields being present in the various SCSI structs. So I don't have any problem killing the wrappers except they may actually be handy for regressions like this one where you could #error if the driver writer violates the ordering requirement.
On 16/01/2019 02:54, Martin K. Petersen wrote: > > Hi John, > Hi Martin, >>> So in this case I think that accessor functions are actually better >>> because they allow us to print a big fat warning when you twiddle >>> something you shouldn't post-initialization. So that's something I think >>> we could--and should--improve. >>> >> Sure, this is an alternative, but I would rather make it obvious when >> these parameters should be set so that this would not be required. > > I would like to have a mechanism in place that warns if you twiddle > things that break assumptions made at host registration time. Yes, something more robust would be good. That's not > a scenario the old registration interface was designed to handle. > > I am not sure I agree with your assertion that setting these masks in > the struct prior to scsi_add_host() makes this ordering requirement much > more obvious. It is not like you are passing in a list of parameters and > then receiving a separately instantiated immutable scsi_host object. You > are performing an operation on something you already have and own. > > That's why I commented that the current intersection between > compile-time static host template, dynamically discovered > pre-registration scsi_host parameters, and the registered runtime > scsi_host struct is somewhat messy. > > Btw. I have no attachment to the prot wrappers whatsoever. The reason > they exist is that the SCSI integrity support was optional. And > therefore we had stub functions so things could be compiled without any > of the integrity fields being present in the various SCSI structs. I never noticed stubs for setting/getting Scsi_host.prot_{capabilities,guard_type} So I > don't have any problem killing the wrappers except they may actually be > handy for regressions like this one where you could #error if the driver > writer violates the ordering requirement. > We set many Scsi_host parameters without such safeguarding, and I don't know what's special about these protection-related members. Thanks, John
diff --git a/drivers/scsi/isci/init.c b/drivers/scsi/isci/init.c index 68b90c4f79a3..1727d0c71b12 100644 --- a/drivers/scsi/isci/init.c +++ b/drivers/scsi/isci/init.c @@ -576,6 +576,13 @@ static struct isci_host *isci_host_alloc(struct pci_dev *pdev, int id) shost->max_lun = ~0; shost->max_cmd_len = MAX_COMMAND_SIZE; + /* turn on DIF support */ + scsi_host_set_prot(shost, + SHOST_DIF_TYPE1_PROTECTION | + SHOST_DIF_TYPE2_PROTECTION | + SHOST_DIF_TYPE3_PROTECTION); + scsi_host_set_guard(shost, SHOST_DIX_GUARD_CRC); + err = scsi_add_host(shost, &pdev->dev); if (err) goto err_shost; @@ -663,13 +670,6 @@ static int isci_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) goto err_host_alloc; } pci_info->hosts[i] = h; - - /* turn on DIF support */ - scsi_host_set_prot(to_shost(h), - SHOST_DIF_TYPE1_PROTECTION | - SHOST_DIF_TYPE2_PROTECTION | - SHOST_DIF_TYPE3_PROTECTION); - scsi_host_set_guard(to_shost(h), SHOST_DIX_GUARD_CRC); } err = isci_setup_interrupts(pdev);
scsi_mq_setup_tags(), which is called by scsi_add_host(), calculates the command size to allocate based on the prot_capabilities. In the isci driver, scsi_host_set_prot() is called after scsi_add_host() so the command size gets calculated to be smaller than it needs to be. Eventually, scsi_mq_init_request() locates the 'prot_sdb' after the command assuming it was sized correctly and a buffer overrun may occur. However, seeing blk_mq_alloc_rqs() rounds up to the nearest cache line size, the mistake can go unnoticed. The bug was noticed after the struct request size was reduced by commit 9d037ad707ed ("block: remove req->timeout_list") Which likely reduced the allocated space for the request by an entire cache line, enough that the overflow could be hit and it caused a panic, on boot, at: RIP: 0010:t10_pi_complete+0x77/0x1c0 Call Trace: <IRQ> sd_done+0xf5/0x340 scsi_finish_command+0xc3/0x120 blk_done_softirq+0x83/0xb0 __do_softirq+0xa1/0x2e6 irq_exit+0xbc/0xd0 call_function_single_interrupt+0xf/0x20 </IRQ> sd_done() would call scsi_prot_sg_count() which reads the number of entities in 'prot_sdb', but seeing 'prot_sdb' is located after the end of the allocated space it reads a garbage number and erroneously calls t10_pi_complete(). To prevent this, the calls to scsi_host_set_prot() are moved into isci_host_alloc() before the call to scsi_add_host(). Out of caution, also move the similar call to scsi_host_set_guard(). Fixes: 3d2d75254915 ("[SCSI] isci: T10 DIF support") Link: http://lkml.kernel.org/r/da851333-eadd-163a-8c78-e1f4ec5ec857@deltatee.com Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: Intel SCU Linux support <intel-linux-scu@intel.com> Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jeff Moyer <jmoyer@redhat.com> --- drivers/scsi/isci/init.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)