diff mbox

[V2] PCI/portdrv: do not disable device on reboot/shutdown

Message ID 82656e20-e821-1944-3399-1667ceb27719@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Sinan Kaya May 24, 2018, 11:43 a.m. UTC
On 5/23/2018 6:57 PM, Sinan Kaya wrote:
>> The crash seems to indicate that the hpsa device attempted a DMA after
>> we cleared the Root Port's PCI_COMMAND_MASTER, which means
>> hpsa_shutdown() didn't stop DMA from the device (it looks like *most*
>> shutdown methods don't disable device DMA, so it's in good company).
> All drivers are expected to shutdown DMA and interrupts in their shutdown()
> routines. They can skip removing threads, data structures etc. but DMA and
> interrupt disabling are required. This is the difference between shutdown()
> and remove() callbacks.

I found this note yesterday to see why we are not disabling the devices in the
PCI core itself. 

pci_device_remove()

	/*
	 * We would love to complain here if pci_dev->is_enabled is set, that
	 * the driver should have called pci_disable_device(), but the
	 * unfortunate fact is there are too many odd BIOS and bridge setups
	 * that don't like drivers doing that all of the time.
	 * Oh well, we can dream of sane hardware when we sleep, no matter how
	 * horrible the crap we have to deal with is when we are awake...
	 */ 

Ryan, can you discard the previous patch and test this one instead? remove() path
in hpsa driver seems to call pci_disable_device() via 

hpsa_remove_one()
	hpsa_free_pci_init()

but nothing on the shutdown path.

Comments

Bjorn Helgaas May 24, 2018, 1:07 p.m. UTC | #1
On Thu, May 24, 2018 at 07:43:05AM -0400, Sinan Kaya wrote:
> On 5/23/2018 6:57 PM, Sinan Kaya wrote:
> >> The crash seems to indicate that the hpsa device attempted a DMA after
> >> we cleared the Root Port's PCI_COMMAND_MASTER, which means
> >> hpsa_shutdown() didn't stop DMA from the device (it looks like *most*
> >> shutdown methods don't disable device DMA, so it's in good company).
> > All drivers are expected to shutdown DMA and interrupts in their shutdown()
> > routines. They can skip removing threads, data structures etc. but DMA and
> > interrupt disabling are required. This is the difference between shutdown()
> > and remove() callbacks.
> 
> I found this note yesterday to see why we are not disabling the
> devices in the PCI core itself. 
> 
> pci_device_remove()
> 
> 	/*
> 	 * We would love to complain here if pci_dev->is_enabled is set, that
> 	 * the driver should have called pci_disable_device(), but the
> 	 * unfortunate fact is there are too many odd BIOS and bridge setups
> 	 * that don't like drivers doing that all of the time.
> 	 * Oh well, we can dream of sane hardware when we sleep, no matter how
> 	 * horrible the crap we have to deal with is when we are awake...
> 	 */ 
> 
> Ryan, can you discard the previous patch and test this one instead?
> remove() path in hpsa driver seems to call pci_disable_device() via 
> 
> hpsa_remove_one()
> 	hpsa_free_pci_init()
> 
> but nothing on the shutdown path.
> 
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 4ed3d26..3823f04 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -8651,6 +8651,7 @@ static void hpsa_shutdown(struct pci_dev *pdev)
>         h->access.set_intr_mask(h, HPSA_INTR_OFF);
>         hpsa_free_irqs(h);                      /* init_one 4 */
>         hpsa_disable_interrupt_mode(h);         /* pci_init 2 */
> +       pci_disable_device(h->pdev);
>  }

I suspect this will make things "work" (if the device can't attempt
DMA, no Unsupported Request error will occur).

But I'm concerned that the reason for the DMA might that hpsa is
transferring buffers from system memory to the hpsa device, and if we
arbitrarily terminate those transfers with pci_disable_device(), we
may leave the hpsa device in an inconsistent state, e.g., with a dirty
filesystem.

But we really need guidance from an hpsa expert.  I don't know the
filesystem/SCSI/hpsa details.

Bjorn
Sinan Kaya May 24, 2018, 1:35 p.m. UTC | #2
On 2018-05-24 09:07, Bjorn Helgaas wrote:
> On Thu, May 24, 2018 at 07:43:05AM -0400, Sinan Kaya wrote:
>> On 5/23/2018 6:57 PM, Sinan Kaya wrote:
>> >> The crash seems to indicate that the hpsa device attempted a DMA after
>> >> we cleared the Root Port's PCI_COMMAND_MASTER, which means
>> >> hpsa_shutdown() didn't stop DMA from the device (it looks like *most*
>> >> shutdown methods don't disable device DMA, so it's in good company).
>> > All drivers are expected to shutdown DMA and interrupts in their shutdown()
>> > routines. They can skip removing threads, data structures etc. but DMA and
>> > interrupt disabling are required. This is the difference between shutdown()
>> > and remove() callbacks.
>> 
>> I found this note yesterday to see why we are not disabling the
>> devices in the PCI core itself.
>> 
>> pci_device_remove()
>> 
>> 	/*
>> 	 * We would love to complain here if pci_dev->is_enabled is set, that
>> 	 * the driver should have called pci_disable_device(), but the
>> 	 * unfortunate fact is there are too many odd BIOS and bridge setups
>> 	 * that don't like drivers doing that all of the time.
>> 	 * Oh well, we can dream of sane hardware when we sleep, no matter 
>> how
>> 	 * horrible the crap we have to deal with is when we are awake...
>> 	 */
>> 
>> Ryan, can you discard the previous patch and test this one instead?
>> remove() path in hpsa driver seems to call pci_disable_device() via
>> 
>> hpsa_remove_one()
>> 	hpsa_free_pci_init()
>> 
>> but nothing on the shutdown path.
>> 
>> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
>> index 4ed3d26..3823f04 100644
>> --- a/drivers/scsi/hpsa.c
>> +++ b/drivers/scsi/hpsa.c
>> @@ -8651,6 +8651,7 @@ static void hpsa_shutdown(struct pci_dev *pdev)
>>         h->access.set_intr_mask(h, HPSA_INTR_OFF);
>>         hpsa_free_irqs(h);                      /* init_one 4 */
>>         hpsa_disable_interrupt_mode(h);         /* pci_init 2 */
>> +       pci_disable_device(h->pdev);
>>  }
> 
> I suspect this will make things "work" (if the device can't attempt
> DMA, no Unsupported Request error will occur).
> 
> But I'm concerned that the reason for the DMA might that hpsa is
> transferring buffers from system memory to the hpsa device, and if we
> arbitrarily terminate those transfers with pci_disable_device(), we
> may leave the hpsa device in an inconsistent state, e.g., with a dirty
> filesystem.
> 
> But we really need guidance from an hpsa expert.  I don't know the
> filesystem/SCSI/hpsa details.

Agreed,

We can drop shutdown and use the remove callback. Remove is supposed to 
do a safe cleanup.

> 
> Bjorn
Don Brace May 24, 2018, 1:37 p.m. UTC | #3
> -----Original Message-----
> From: Bjorn Helgaas [mailto:helgaas@kernel.org]
> Sent: Thursday, May 24, 2018 8:07 AM
> To: Sinan Kaya <okaya@codeaurora.org>
> Cc: linux-pci@vger.kernel.org; timur@codeaurora.org; ryan@finnie.org; linux-
> arm-msm@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
> stable@vger.kernel.org; Bjorn Helgaas <bhelgaas@google.com>; Rafael J.
> Wysocki <rafael.j.wysocki@intel.com>; Greg Kroah-Hartman
> <gregkh@linuxfoundation.org>; Thomas Gleixner <tglx@linutronix.de>; Kate
> Stewart <kstewart@linuxfoundation.org>; Frederick Lawler
> <fred@fredlawl.com>; Dongdong Liu <liudongdong3@huawei.com>; Mika
> Westerberg <mika.westerberg@linux.intel.com>; open list <linux-
> kernel@vger.kernel.org>; Don Brace <don.brace@microsemi.com>;
> esc.storagedev <esc.storagedev@microsemi.com>; linux-scsi@vger.kernel.org
> Subject: Re: [PATCH V2] PCI/portdrv: do not disable device on reboot/shutdown
> 
> EXTERNAL EMAIL
> 
> 
> On Thu, May 24, 2018 at 07:43:05AM -0400, Sinan Kaya wrote:
> > On 5/23/2018 6:57 PM, Sinan Kaya wrote:
> > >> The crash seems to indicate that the hpsa device attempted a DMA after
> > >> we cleared the Root Port's PCI_COMMAND_MASTER, which means
> > >> hpsa_shutdown() didn't stop DMA from the device (it looks like *most*
> > >> shutdown methods don't disable device DMA, so it's in good company).
> > > All drivers are expected to shutdown DMA and interrupts in their shutdown()
> > > routines. They can skip removing threads, data structures etc. but DMA and
> > > interrupt disabling are required. This is the difference between shutdown()
> > > and remove() callbacks.
> >
> > I found this note yesterday to see why we are not disabling the
> > devices in the PCI core itself.
> >
> > pci_device_remove()
> >
> >       /*
> >        * We would love to complain here if pci_dev->is_enabled is set, that
> >        * the driver should have called pci_disable_device(), but the
> >        * unfortunate fact is there are too many odd BIOS and bridge setups
> >        * that don't like drivers doing that all of the time.
> >        * Oh well, we can dream of sane hardware when we sleep, no matter how
> >        * horrible the crap we have to deal with is when we are awake...
> >        */
> >
> > Ryan, can you discard the previous patch and test this one instead?
> > remove() path in hpsa driver seems to call pci_disable_device() via
> >
> > hpsa_remove_one()
> >       hpsa_free_pci_init()
> >
> > but nothing on the shutdown path.
> >
> > diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> > index 4ed3d26..3823f04 100644
> > --- a/drivers/scsi/hpsa.c
> > +++ b/drivers/scsi/hpsa.c
> > @@ -8651,6 +8651,7 @@ static void hpsa_shutdown(struct pci_dev *pdev)
> >         h->access.set_intr_mask(h, HPSA_INTR_OFF);
> >         hpsa_free_irqs(h);                      /* init_one 4 */
> >         hpsa_disable_interrupt_mode(h);         /* pci_init 2 */
> > +       pci_disable_device(h->pdev);
> >  }
> 
> I suspect this will make things "work" (if the device can't attempt
> DMA, no Unsupported Request error will occur).
> 
> But I'm concerned that the reason for the DMA might that hpsa is
> transferring buffers from system memory to the hpsa device, and if we
> arbitrarily terminate those transfers with pci_disable_device(), we
> may leave the hpsa device in an inconsistent state, e.g., with a dirty
> filesystem.
> 
> But we really need guidance from an hpsa expert.  I don't know the
> filesystem/SCSI/hpsa details.
> 
> Bjorn

It's most likely OCSD traffic that will stop when bus mastering is turned off.
So, I'll run some tests on my end before ACKing your patch.

Thanks,
Don Brace
ESC - Smart Storage
Microsemi Corporation
Sinan Kaya May 28, 2018, 9:25 p.m. UTC | #4
On 5/24/2018 6:37 AM, Don Brace wrote:
>> But we really need guidance from an hpsa expert.  I don't know the
>> filesystem/SCSI/hpsa details.
>>
>> Bjorn
> It's most likely OCSD traffic that will stop when bus mastering is turned off.
> So, I'll run some tests on my end before ACKing your patch.

Can you test V3 instead of this?

I don't think adding pci_disable_device() to shutdown() is enough to put
the HW into safe state. I moved clean  up responsibility to remove() instead
of shutdown().
diff mbox

Patch

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 4ed3d26..3823f04 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -8651,6 +8651,7 @@  static void hpsa_shutdown(struct pci_dev *pdev)
        h->access.set_intr_mask(h, HPSA_INTR_OFF);
        hpsa_free_irqs(h);                      /* init_one 4 */
        hpsa_disable_interrupt_mode(h);         /* pci_init 2 */
+       pci_disable_device(h->pdev);
 }