diff mbox

[v2.6.38-rc7] Revert "libata: ahci_start_engine compliant to AHCI spec"

Message ID BANLkTimZ9pe-FhkJYHTG1Jx8MUPrMHSFbA@mail.gmail.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Jian Peng May 18, 2011, 9:25 p.m. UTC
Sure, Please try this. Thanks,


2011/5/18 Rafael J. Wysocki <rjw@sisk.pl>

> On Wednesday, May 18, 2011, Jian Peng wrote:
> > Hi, Valdis/Rafael/Michael,
> >
> > Could you help me test the following change?
> >
> > After reverting 81ca7e4, add 5ms delay as follow since that seems also
> > fixing the issue on my SATA host controller that requires 81ca7e4.
> >
> > In drivers/ata/libahci.c, inside ahci_hardreset() function,
> >
> >
> > 1333 <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1333>
> > ahci_start_engine
> > <http://lxr.linux.no/linux+*/+code=ahci_start_engine>(ap
> > <http://lxr.linux.no/linux+*/+code=ap>);
> >
> >             msleep(5);1334
> > <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1334>1335
> > <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1335>        if
> > (online <http://lxr.linux.no/linux+*/+code=online>)1336
> > <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1336>
> >   *class <http://lxr.linux.no/linux+*/+code=class> = ahci_dev_classify
> > <http://lxr.linux.no/linux+*/+code=ahci_dev_classify>(ap
> > <http://lxr.linux.no/linux+*/+code=ap>);
> >
> > Since my host controller requires time to switch internal state to be
> ready.
> > Please let me know your testing result.
>
> Could you simply post a patch?
>
> Rafael
>

Comments

Jian Peng May 19, 2011, 12:14 a.m. UTC | #1
Hi, Rafael,

I am using Dell E6410 with same AHCI host controller as your system, but
could not reproduce the bug you found out (using Ubuntu 11.04 and 2.6.39-rc6
kernel).
Do you know which config file I need change to perform this testing?

Thanks,
Jian

On Wed, May 18, 2011 at 2:25 PM, Jian Peng <jipeng2005@gmail.com> wrote:

> Sure, Please try this. Thanks,
>
> diff -Naur a/drivers/ata/libahci.c b/drivers/ata/libahci.c
> --- a/drivers/ata/libahci.c 2011-05-18 14:23:36.564665643 -0700
> +++ b/drivers/ata/libahci.c 2011-05-18 14:24:52.564614378 -0700
> @@ -539,27 +539,6 @@
>
>  {
>   void __iomem *port_mmio = ahci_port_base(ap);
>   u32 tmp;
> - u8 status;
> -
> - status = readl(port_mmio + PORT_TFDATA) & 0xFF;
> -
> - /*
> -  * At end of section 10.1 of AHCI spec (rev 1.3), it states
> -  * Software shall not set PxCMD.ST to 1 until it is determined
> -  * that a functoinal device is present on the port as determined by
> -  * PxTFD.STS.BSY=0, PxTFD.STS.DRQ=0 and PxSSTS.DET=3h
> -  *
> -  * Even though most AHCI host controllers work without this check,
>
> -  * specific controller will fail under this condition
> -  */
> - if (status & (ATA_BUSY | ATA_DRQ))
> -  return;
> - else {
> -  ahci_scr_read(&ap->link, SCR_STATUS, &tmp);
> -
> -  if ((tmp & 0xf) != 0x3)
> -   return;
> - }
>
>   /* start DMA */
>   tmp = readl(port_mmio + PORT_CMD);
> @@ -1353,6 +1332,8 @@
>
>
>   ahci_start_engine(ap);
>
> + msleep(5);
> +
>   if (online)
>
>    *class = ahci_dev_classify(ap);
>
>
> 2011/5/18 Rafael J. Wysocki <rjw@sisk.pl>
>
>  On Wednesday, May 18, 2011, Jian Peng wrote:
>> > Hi, Valdis/Rafael/Michael,
>> >
>> > Could you help me test the following change?
>> >
>> > After reverting 81ca7e4, add 5ms delay as follow since that seems also
>> > fixing the issue on my SATA host controller that requires 81ca7e4.
>> >
>> > In drivers/ata/libahci.c, inside ahci_hardreset() function,
>> >
>> >
>> > 1333 <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1333>
>> > ahci_start_engine
>> > <http://lxr.linux.no/linux+*/+code=ahci_start_engine>(ap
>> > <http://lxr.linux.no/linux+*/+code=ap>);
>> >
>> >             msleep(5);1334
>> > <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1334>1335
>> > <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1335>        if
>> > (online <http://lxr.linux.no/linux+*/+code=online>)1336
>> > <http://lxr.linux.no/linux+*/drivers/ata/libahci.c#L1336>
>> >   *class <http://lxr.linux.no/linux+*/+code=class> = ahci_dev_classify
>> > <http://lxr.linux.no/linux+*/+code=ahci_dev_classify>(ap
>> > <http://lxr.linux.no/linux+*/+code=ap>);
>> >
>> > Since my host controller requires time to switch internal state to be
>> ready.
>> > Please let me know your testing result.
>>
>> Could you simply post a patch?
>>
>> Rafael
>>
>
>
Rafael Wysocki May 19, 2011, 10:19 a.m. UTC | #2
On Thursday, May 19, 2011, Jian Peng wrote:
> Hi, Rafael,

Hi,

> I am using Dell E6410 with same AHCI host controller as your system, but
> could not reproduce the bug you found out (using Ubuntu 11.04 and 2.6.39-rc6
> kernel).

OK, do I think correctly you tested without the patch you sent previously?

> Do you know which config file I need change to perform this testing?

I'm not sure what you mean.  The kernel .config?

Rafael
Jian Peng May 19, 2011, 4:42 p.m. UTC | #3
Hi, Rafael,

Yes. I tested without reverting the patch that caused failure on E6410 (with
same AHCI controller), The system is Ubuntu 11.04 with default 2.6.39-rc6
kernel (guess this should be same kernel as yours before reverting the
patch).

My system worked well with AC power cable plugging in and out at boot time,
at runtime, and during "dd if=/dev/sda5 of=/dev/null ..".

The config file I mentioned is those under /etc/ or other directories that
is used to control the power management.

Did you test my patch? What it the result?

Thanks,
Jian


2011/5/19 Rafael J. Wysocki <rjw@sisk.pl>

> On Thursday, May 19, 2011, Jian Peng wrote:
> > Hi, Rafael,
>
> Hi,
>
> > I am using Dell E6410 with same AHCI host controller as your system, but
> > could not reproduce the bug you found out (using Ubuntu 11.04 and
> 2.6.39-rc6
> > kernel).
>
> OK, do I think correctly you tested without the patch you sent previously?
>
> > Do you know which config file I need change to perform this testing?
>
> I'm not sure what you mean.  The kernel .config?
>
> Rafael
>
Rafael Wysocki May 19, 2011, 9:32 p.m. UTC | #4
On Thursday, May 19, 2011, Jian Peng wrote:
> Hi, Rafael,
> 
> Yes. I tested without reverting the patch that caused failure on E6410 (with
> same AHCI controller), The system is Ubuntu 11.04 with default 2.6.39-rc6
> kernel (guess this should be same kernel as yours before reverting the
> patch).
> 
> My system worked well with AC power cable plugging in and out at boot time,
> at runtime, and during "dd if=/dev/sda5 of=/dev/null ..".

What kind of disk did you use?  Mine is an Intel SSD.

> The config file I mentioned is those under /etc/ or other directories that
> is used to control the power management.

They aren't relevant to this issue.

> Did you test my patch? What it the result?

No, I didn't.  Unfortunately, I won't be able to do any tests on the affected
machine in the near future (most probably for the next two weeks or so).

Thanks,
Rafael
Jian Peng May 19, 2011, 9:57 p.m. UTC | #5
Hi, Rafael,

I am using internal WDC laptop HDD. I will give SDD a try even though I only
have non-Intel SDD now.

Is your SDD the boot disk (by swapping the built-in HDD out) or the extra
one installed using device bay on Dell laptop?

Thanks,
Jian


2011/5/19 Rafael J. Wysocki <rjw@sisk.pl>

> On Thursday, May 19, 2011, Jian Peng wrote:
> > Hi, Rafael,
> >
> > Yes. I tested without reverting the patch that caused failure on E6410
> (with
> > same AHCI controller), The system is Ubuntu 11.04 with default 2.6.39-rc6
> > kernel (guess this should be same kernel as yours before reverting the
> > patch).
> >
> > My system worked well with AC power cable plugging in and out at boot
> time,
> > at runtime, and during "dd if=/dev/sda5 of=/dev/null ..".
>
> What kind of disk did you use?  Mine is an Intel SSD.
>
> > The config file I mentioned is those under /etc/ or other directories
> that
> > is used to control the power management.
>
> They aren't relevant to this issue.
>
> > Did you test my patch? What it the result?
>
> No, I didn't.  Unfortunately, I won't be able to do any tests on the
> affected
> machine in the near future (most probably for the next two weeks or so).
>
> Thanks,
> Rafael
>
Rafael Wysocki May 19, 2011, 10:03 p.m. UTC | #6
On Thursday, May 19, 2011, Jian Peng wrote:
> Hi, Rafael,
> 
> I am using internal WDC laptop HDD. I will give SDD a try even though I only
> have non-Intel SDD now.
> 
> Is your SDD the boot disk (by swapping the built-in HDD out)

Yes, it is.

> or the extra one installed using device bay on Dell laptop?

The machine is not a Dell.  It's an Acer.

Thanks,
Rafael
Valdis Klētnieks May 20, 2011, 3:40 p.m. UTC | #7
On Wed, 18 May 2011 17:14:56 PDT, Jian Peng said:

> > @@ -1353,6 +1332,8 @@
> >
> >
> >   ahci_start_engine(ap);
> >
> > + msleep(5);
> > +
> >   if (online)
> >
> >    *class = ahci_dev_classify(ap);
> >

It may very well be that adding a magic msleep(5) here just Makes It Work, but
I have a gut feeling that it's in the wrong place (for starters, 'online' can't change
during the msleep() unless somebody *else* sets it - in which case the locking
is screwed up as we're not forcing a re-read of the value).  The msleep() probably
needs to be before something else further down in the code (but I have no idea
exactly what).
Tejun Heo May 20, 2011, 3:43 p.m. UTC | #8
On Fri, May 20, 2011 at 11:40:20AM -0400, Valdis.Kletnieks@vt.edu wrote:
> On Wed, 18 May 2011 17:14:56 PDT, Jian Peng said:
> 
> > > @@ -1353,6 +1332,8 @@
> > >
> > >
> > >   ahci_start_engine(ap);
> > >
> > > + msleep(5);
> > > +
> > >   if (online)
> > >
> > >    *class = ahci_dev_classify(ap);
> > >
> 
> It may very well be that adding a magic msleep(5) here just Makes It Work, but
> I have a gut feeling that it's in the wrong place (for starters, 'online' can't change
> during the msleep() unless somebody *else* sets it - in which case the locking
> is screwed up as we're not forcing a re-read of the value).  The msleep() probably
> needs to be before something else further down in the code (but I have no idea
> exactly what).

At this point, I think it would be better to simply add a flag and
enable the check for the affected controller.

Thanks.
Jian Peng May 20, 2011, 6:21 p.m. UTC | #9
Sorry that I need fix my gmail setting to make it show up in LKML.

Hi, Tejun/Valdis,

Since this is an interoperability issue of SATA host controller, the
first step I want to try it to make sure the tweak that MAKE my
controller WORK does not break other controllers.
You are both right that adding this majic 5ms delay at this place
should not be the final solution.

If this magic 5ms delay works on other affected systems, I plan to
post a new patch that inside ahci_start_engine(), still perform same
check, and show warning message if failed, but will set a flag, then
still set START bit, and if there is such failure flag, add 5ms delay.

Valdis, could you apply the following patch and retest it?

Tejun, please review it.

--- a/drivers/ata/libahci.c 2011-05-18 14:23:36.564665643 -0700

+++ c/drivers/ata/libahci.c 2011-05-20 09:48:06.194663506 -0700
@@ -540,6 +540,7 @@

  void __iomem *port_mmio = ahci_port_base(ap);
  u32 tmp;
  u8 status;

+ int err = 0;


  status = readl(port_mmio + PORT_TFDATA) & 0xFF;


@@ -553,12 +554,12 @@

   * specific controller will fail under this condition
   */
  if (status & (ATA_BUSY | ATA_DRQ))
-  return;

+  err = 1;

  else {
   ahci_scr_read(&ap->link, SCR_STATUS, &tmp);

   if ((tmp & 0xf) != 0x3)
-   return;

+   err = 1;
  }

  /* start DMA */
@@ -566,6 +567,13 @@
  tmp |= PORT_CMD_START;
  writel(tmp, port_mmio + PORT_CMD);
  readl(port_mmio + PORT_CMD); /* flush */
+
+ /* Some controllers need longer time to be ready */
+ if(err) {
+  printk(KERN_WARNING
+   "Controller in wrong state when setting START bit\n");
+  msleep(5);
+ }
 }
 EXPORT_SYMBOL_GPL(ahci_start_engine);

On Fri, May 20, 2011 at 8:43 AM, Tejun Heo <tj@kernel.org> wrote:
>
> On Fri, May 20, 2011 at 11:40:20AM -0400, Valdis.Kletnieks@vt.edu wrote:
> > On Wed, 18 May 2011 17:14:56 PDT, Jian Peng said:
> >
> > > > @@ -1353,6 +1332,8 @@
> > > >
> > > >
> > > >   ahci_start_engine(ap);
> > > >
> > > > + msleep(5);
> > > > +
> > > >   if (online)
> > > >
> > > >    *class = ahci_dev_classify(ap);
> > > >
> >
> > It may very well be that adding a magic msleep(5) here just Makes It Work, but
> > I have a gut feeling that it's in the wrong place (for starters, 'online' can't change
> > during the msleep() unless somebody *else* sets it - in which case the locking
> > is screwed up as we're not forcing a re-read of the value).  The msleep() probably
> > needs to be before something else further down in the code (but I have no idea
> > exactly what).
>
> At this point, I think it would be better to simply add a flag and
> enable the check for the affected controller.
>
> Thanks.
>
> --
> tejun
diff mbox

Patch

diff -Naur a/drivers/ata/libahci.c b/drivers/ata/libahci.c
--- a/drivers/ata/libahci.c 2011-05-18 14:23:36.564665643 -0700
+++ b/drivers/ata/libahci.c 2011-05-18 14:24:52.564614378 -0700
@@ -539,27 +539,6 @@ 
 {
  void __iomem *port_mmio = ahci_port_base(ap);
  u32 tmp;
- u8 status;
-
- status = readl(port_mmio + PORT_TFDATA) & 0xFF;
-
- /*
-  * At end of section 10.1 of AHCI spec (rev 1.3), it states
-  * Software shall not set PxCMD.ST to 1 until it is determined
-  * that a functoinal device is present on the port as determined by
-  * PxTFD.STS.BSY=0, PxTFD.STS.DRQ=0 and PxSSTS.DET=3h
-  *
-  * Even though most AHCI host controllers work without this check,
-  * specific controller will fail under this condition
-  */
- if (status & (ATA_BUSY | ATA_DRQ))
-  return;
- else {
-  ahci_scr_read(&ap->link, SCR_STATUS, &tmp);
-
-  if ((tmp & 0xf) != 0x3)
-   return;
- }

  /* start DMA */
  tmp = readl(port_mmio + PORT_CMD);
@@ -1353,6 +1332,8 @@ 

  ahci_start_engine(ap);

+ msleep(5);
+
  if (online)
   *class = ahci_dev_classify(ap);