diff mbox

Hard and silent lock up since linux 3.14 with PCIe pass through (vfio)

Message ID 1413927152.4202.195.camel@ul30vt.home (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Alex Williamson Oct. 21, 2014, 9:32 p.m. UTC
On Tue, 2014-10-21 at 15:06 -0600, Alex Williamson wrote:
> Hi Andreas,
> 
> On Fri, 2014-10-17 at 03:04 +0200, Andreas Hartmann wrote:
> > Hello Alex,
> > 
> > Alex Williamson wrote:
> > > Hi Andreas,
> > [...]
> > > Sorry for the breakage.  Is it possible to run lspci on the device in a
> > > loop from the host and capture whether we're failing to restore some of
> > > the VC bits to their previous state? 
> > 
> > > Does the problem also occur if you
> > > unbind from host driver,
> > 
> > The machine is booted w/ blacklisted ath9k. Then, the device is bound to
> > vfio:
> > 
> > echo "168c 0030" > /sys/bus/pci/drivers/vfio-pci/new_id
> > echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
> > echo 0000:03:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
> > 
> > afterwards the VM is started -> hang.
> > 
> > W/o starting th VM, I can bind it to vfio and unbind it from vfio w/o
> > any problem.
> > 
> > > echo 1 > reset in pci-sysfs,
> > 
> > echo 1 > /sys/bus/pci/devices/0000:03:00.0 works w/o any problem while
> > bound to vfio. Even after unbinding from vfio and rebinding to vfio
> > again ... .
> > 
> > > and re-bind to the
> > 
> > Do you mean loading ath9k in host system after unbinding from vfio? If
> > yes: Works w/o any problem. It's even possible to reset it or do a
> > ifconfig wlan0 up, ifconfig wlan0 down, rmmod ath9k, bind it to vfio
> > again and reset it, ....
> > 
> > Looks like the hang only is triggered by qemu-system_x86_64 on startup
> > the VM.

Also, this might be because QEMU since 1.7 will favor doing a bus reset
for a device over PM reset while the sysfs reset interface will only do
a bus reset if there are no other methods available and there are no
other devices on the bus.  Can you reproduce the hang using the sysfs
reset interface without QEMU if you modify the kernel like this:




> > > host?  I'll also try to reproduce on my 990fx system, but I won't be
> > > able to do that until next week due to travel.  Thanks,
> 
> Could you send me the lspci -vvvxxxx for the device and parent root
> port?  Thanks,
> 
> Alex
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Andreas Hartmann Oct. 22, 2014, 4:22 p.m. UTC | #1
Alex Williamson wrote:
> On Tue, 2014-10-21 at 15:06 -0600, Alex Williamson wrote:
>> Hi Andreas,
>>
>> On Fri, 2014-10-17 at 03:04 +0200, Andreas Hartmann wrote:
>>> Hello Alex,
>>>
>>> Alex Williamson wrote:
>>>> Hi Andreas,
>>> [...]
>>>> Sorry for the breakage.  Is it possible to run lspci on the device in a
>>>> loop from the host and capture whether we're failing to restore some of
>>>> the VC bits to their previous state? 
>>>
>>>> Does the problem also occur if you
>>>> unbind from host driver,
>>>
>>> The machine is booted w/ blacklisted ath9k. Then, the device is bound to
>>> vfio:
>>>
>>> echo "168c 0030" > /sys/bus/pci/drivers/vfio-pci/new_id
>>> echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
>>> echo 0000:03:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
>>>
>>> afterwards the VM is started -> hang.
>>>
>>> W/o starting th VM, I can bind it to vfio and unbind it from vfio w/o
>>> any problem.
>>>
>>>> echo 1 > reset in pci-sysfs,
>>>
>>> echo 1 > /sys/bus/pci/devices/0000:03:00.0 works w/o any problem while
>>> bound to vfio. Even after unbinding from vfio and rebinding to vfio
>>> again ... .
>>>
>>>> and re-bind to the
>>>
>>> Do you mean loading ath9k in host system after unbinding from vfio? If
>>> yes: Works w/o any problem. It's even possible to reset it or do a
>>> ifconfig wlan0 up, ifconfig wlan0 down, rmmod ath9k, bind it to vfio
>>> again and reset it, ....
>>>
>>> Looks like the hang only is triggered by qemu-system_x86_64 on startup
>>> the VM.
> 
> Also, this might be because QEMU since 1.7 will favor doing a bus reset
> for a device over PM reset while the sysfs reset interface will only do
> a bus reset if there are no other methods available and there are no
> other devices on the bus.  Can you reproduce the hang using the sysfs
> reset interface without QEMU if you modify the kernel like this:
> 
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3308,15 +3308,15 @@ static int __pci_dev_reset(struct pci_dev *dev, int prob
>         if (rc != -ENOTTY)
>                 goto done;
>  
> -       rc = pci_pm_reset(dev, probe);
> +       rc = pci_dev_reset_slot_function(dev, probe);
>         if (rc != -ENOTTY)
>                 goto done;
>  
> -       rc = pci_dev_reset_slot_function(dev, probe);
> +       rc = pci_parent_bus_reset(dev, probe);
>         if (rc != -ENOTTY)
>                 goto done;
>  
> -       rc = pci_parent_bus_reset(dev, probe);
> +       rc = pci_pm_reset(dev, probe);
>  done:
>         return rc;
>  }

This way it's crashing with echo 1 > reset, too.


Regards,
Andreas

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Oct. 22, 2014, 8:36 p.m. UTC | #2
On Wed, 2014-10-22 at 18:22 +0200, Andreas Hartmann wrote:
> Alex Williamson wrote:
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3308,15 +3308,15 @@ static int __pci_dev_reset(struct pci_dev *dev, int prob
> >         if (rc != -ENOTTY)
> >                 goto done;
> >  
> > -       rc = pci_pm_reset(dev, probe);
> > +       rc = pci_dev_reset_slot_function(dev, probe);
> >         if (rc != -ENOTTY)
> >                 goto done;
> >  
> > -       rc = pci_dev_reset_slot_function(dev, probe);
> > +       rc = pci_parent_bus_reset(dev, probe);
> >         if (rc != -ENOTTY)
> >                 goto done;
> >  
> > -       rc = pci_parent_bus_reset(dev, probe);
> > +       rc = pci_pm_reset(dev, probe);
> >  done:
> >         return rc;
> >  }
> 
> This way it's crashing with echo 1 > reset, too.

Ok, so it's somehow related to doing a bus reset with virtual channel
save/restore while PM reset with VC save/restore works ok as apparently
does bus reset without VC save/restore.  Let's try to do a manual bus
reset so we can look at the post reset state of the device before the
kernel tries to restore it.

First bind the target device 03:00.0 to pci-stub or vfio-pci so that we
know it's not being used.

Next capture lspci -xxxx -s 3:00.0 so we have the starting state.

Then we'll do a bus reset using setpci:
# setpci -s 00:05.0 3e.w=40:40
<if you script this, wait at least 2ms here>
# setpci -s 00:05.0 3e.w=00:40
<wait 1 second here>

Now re-capture lspci -xxxx -s 3:00.0

The interesting lines for your device are 140: and 150:, so if you want
to avoid sending massive emails you can just send those for the before
and after.  You'll need to reboot the system before you do anything else
with this device since it's now in an uninitialized state.  Based on
what the lspci output reports (or whether you experience a hang simply
from this), we may want to try writing additional bits with setpci to
mimic the VC restore behavior.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Hartmann Oct. 23, 2014, 4 p.m. UTC | #3
Alex Williamson wrote:
> On Wed, 2014-10-22 at 18:22 +0200, Andreas Hartmann wrote:
>> Alex Williamson wrote:
>>> --- a/drivers/pci/pci.c
>>> +++ b/drivers/pci/pci.c
>>> @@ -3308,15 +3308,15 @@ static int __pci_dev_reset(struct pci_dev *dev, int prob
>>>         if (rc != -ENOTTY)
>>>                 goto done;
>>>  
>>> -       rc = pci_pm_reset(dev, probe);
>>> +       rc = pci_dev_reset_slot_function(dev, probe);
>>>         if (rc != -ENOTTY)
>>>                 goto done;
>>>  
>>> -       rc = pci_dev_reset_slot_function(dev, probe);
>>> +       rc = pci_parent_bus_reset(dev, probe);
>>>         if (rc != -ENOTTY)
>>>                 goto done;
>>>  
>>> -       rc = pci_parent_bus_reset(dev, probe);
>>> +       rc = pci_pm_reset(dev, probe);
>>>  done:
>>>         return rc;
>>>  }
>>
>> This way it's crashing with echo 1 > reset, too.
> 
> Ok, so it's somehow related to doing a bus reset with virtual channel
> save/restore while PM reset with VC save/restore works ok as apparently
> does bus reset without VC save/restore.  Let's try to do a manual bus
> reset so we can look at the post reset state of the device before the
> kernel tries to restore it.
> 
> First bind the target device 03:00.0 to pci-stub or vfio-pci so that we
> know it's not being used.
> 
> Next capture lspci -xxxx -s 3:00.0 so we have the starting state.
> 
> Then we'll do a bus reset using setpci:
> # setpci -s 00:05.0 3e.w=40:40
> <if you script this, wait at least 2ms here>
> # setpci -s 00:05.0 3e.w=00:40
> <wait 1 second here>
> 
> Now re-capture lspci -xxxx -s 3:00.0

The machine is booted w/ vfio bound to 3:00.0 as usual (now for testing
linux 3.14)

lspci -xxxx -s 3:00.0
setpci -s 00:05.0 3e.w=40:40
usleep 10
setpci -s 00:05.0 3e.w=00:40
sleep 1
lspci -xxxx -s 3:00.0

I didn't get the second lspci because the machine already was hanging.
The first output is attached completely.



Hope this helps,
thanks,
regards,
Andreas
Alex Williamson Oct. 23, 2014, 4:33 p.m. UTC | #4
On Thu, 2014-10-23 at 18:00 +0200, Andreas Hartmann wrote:
> Alex Williamson wrote:
> > On Wed, 2014-10-22 at 18:22 +0200, Andreas Hartmann wrote:
> >> Alex Williamson wrote:
> >>> --- a/drivers/pci/pci.c
> >>> +++ b/drivers/pci/pci.c
> >>> @@ -3308,15 +3308,15 @@ static int __pci_dev_reset(struct pci_dev *dev, int prob
> >>>         if (rc != -ENOTTY)
> >>>                 goto done;
> >>>  
> >>> -       rc = pci_pm_reset(dev, probe);
> >>> +       rc = pci_dev_reset_slot_function(dev, probe);
> >>>         if (rc != -ENOTTY)
> >>>                 goto done;
> >>>  
> >>> -       rc = pci_dev_reset_slot_function(dev, probe);
> >>> +       rc = pci_parent_bus_reset(dev, probe);
> >>>         if (rc != -ENOTTY)
> >>>                 goto done;
> >>>  
> >>> -       rc = pci_parent_bus_reset(dev, probe);
> >>> +       rc = pci_pm_reset(dev, probe);
> >>>  done:
> >>>         return rc;
> >>>  }
> >>
> >> This way it's crashing with echo 1 > reset, too.
> > 
> > Ok, so it's somehow related to doing a bus reset with virtual channel
> > save/restore while PM reset with VC save/restore works ok as apparently
> > does bus reset without VC save/restore.  Let's try to do a manual bus
> > reset so we can look at the post reset state of the device before the
> > kernel tries to restore it.
> > 
> > First bind the target device 03:00.0 to pci-stub or vfio-pci so that we
> > know it's not being used.
> > 
> > Next capture lspci -xxxx -s 3:00.0 so we have the starting state.
> > 
> > Then we'll do a bus reset using setpci:
> > # setpci -s 00:05.0 3e.w=40:40
> > <if you script this, wait at least 2ms here>
> > # setpci -s 00:05.0 3e.w=00:40
> > <wait 1 second here>
> > 
> > Now re-capture lspci -xxxx -s 3:00.0
> 
> The machine is booted w/ vfio bound to 3:00.0 as usual (now for testing
> linux 3.14)
> 
> lspci -xxxx -s 3:00.0
> setpci -s 00:05.0 3e.w=40:40
> usleep 10
> setpci -s 00:05.0 3e.w=00:40
> sleep 1
> lspci -xxxx -s 3:00.0
> 
> I didn't get the second lspci because the machine already was hanging.
> The first output is attached completely.

Hmm, that doesn't make much sense.  You had found that if you disabled
the VC save/restore then QEMU works.  That should have still been using
secondary bus reset as we're trying to do here, so I don't understand
why we can't do a manual secondary bus reset now.

If you use Bjorn's previous patch to disable VC save/restore and my
patch to reorder the reset mechanisms, does echo 1 > reset for the sysfs
entry for the device also still cause a hang?

Can you provide a link to the specific model for this card?  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Hartmann Oct. 23, 2014, 5:12 p.m. UTC | #5
Alex Williamson wrote:
> On Thu, 2014-10-23 at 18:00 +0200, Andreas Hartmann wrote:
>> Alex Williamson wrote:
>>> On Wed, 2014-10-22 at 18:22 +0200, Andreas Hartmann wrote:
>>>> Alex Williamson wrote:
>>>>> --- a/drivers/pci/pci.c
>>>>> +++ b/drivers/pci/pci.c
>>>>> @@ -3308,15 +3308,15 @@ static int __pci_dev_reset(struct pci_dev *dev, int prob
>>>>>         if (rc != -ENOTTY)
>>>>>                 goto done;
>>>>>  
>>>>> -       rc = pci_pm_reset(dev, probe);
>>>>> +       rc = pci_dev_reset_slot_function(dev, probe);
>>>>>         if (rc != -ENOTTY)
>>>>>                 goto done;
>>>>>  
>>>>> -       rc = pci_dev_reset_slot_function(dev, probe);
>>>>> +       rc = pci_parent_bus_reset(dev, probe);
>>>>>         if (rc != -ENOTTY)
>>>>>                 goto done;
>>>>>  
>>>>> -       rc = pci_parent_bus_reset(dev, probe);
>>>>> +       rc = pci_pm_reset(dev, probe);
>>>>>  done:
>>>>>         return rc;
>>>>>  }
>>>>
>>>> This way it's crashing with echo 1 > reset, too.
>>>
>>> Ok, so it's somehow related to doing a bus reset with virtual channel
>>> save/restore while PM reset with VC save/restore works ok as apparently
>>> does bus reset without VC save/restore.  Let's try to do a manual bus
>>> reset so we can look at the post reset state of the device before the
>>> kernel tries to restore it.
>>>
>>> First bind the target device 03:00.0 to pci-stub or vfio-pci so that we
>>> know it's not being used.
>>>
>>> Next capture lspci -xxxx -s 3:00.0 so we have the starting state.
>>>
>>> Then we'll do a bus reset using setpci:
>>> # setpci -s 00:05.0 3e.w=40:40
>>> <if you script this, wait at least 2ms here>
>>> # setpci -s 00:05.0 3e.w=00:40
>>> <wait 1 second here>
>>>
>>> Now re-capture lspci -xxxx -s 3:00.0
>>
>> The machine is booted w/ vfio bound to 3:00.0 as usual (now for testing
>> linux 3.14)
>>
>> lspci -xxxx -s 3:00.0
>> setpci -s 00:05.0 3e.w=40:40
>> usleep 10
>> setpci -s 00:05.0 3e.w=00:40
>> sleep 1
>> lspci -xxxx -s 3:00.0
>>
>> I didn't get the second lspci because the machine already was hanging.
>> The first output is attached completely.
> 
> Hmm, that doesn't make much sense.  You had found that if you disabled
> the VC save/restore then QEMU works.  That should have still been using
> secondary bus reset as we're trying to do here, so I don't understand
> why we can't do a manual secondary bus reset now.
> 
> If you use Bjorn's previous patch to disable VC save/restore and my
> patch to reorder the reset mechanisms, does echo 1 > reset for the sysfs
> entry for the device also still cause a hang?

I will test it.

> Can you provide a link to the specific model for this card?  Thanks,

http://www.tp-link.com.de/support/download/?model=TL-WDN4800&version=V1


Regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Hartmann Oct. 23, 2014, 5:33 p.m. UTC | #6
Alex Williamson wrote:
[...]
> If you use Bjorn's previous patch to disable VC save/restore and my
> patch to reorder the reset mechanisms, does echo 1 > reset for the sysfs
> entry for the device also still cause a hang?

Yes - it's hanging too (w/ vfio bound to the device - didn't test other
possibilities).


Regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Oct. 23, 2014, 7:37 p.m. UTC | #7
On Thu, 2014-10-23 at 19:33 +0200, Andreas Hartmann wrote:
> Alex Williamson wrote:
> [...]
> > If you use Bjorn's previous patch to disable VC save/restore and my
> > patch to reorder the reset mechanisms, does echo 1 > reset for the sysfs
> > entry for the device also still cause a hang?
> 
> Yes - it's hanging too (w/ vfio bound to the device - didn't test other
> possibilities).

Does it happen regardless of the slot the card is plugged into?  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Hartmann Oct. 24, 2014, 2:21 p.m. UTC | #8
Alex Williamson wrote:
> On Thu, 2014-10-23 at 19:33 +0200, Andreas Hartmann wrote:
>> Alex Williamson wrote:
>> [...]
>>> If you use Bjorn's previous patch to disable VC save/restore and my
>>> patch to reorder the reset mechanisms, does echo 1 > reset for the sysfs
>>> entry for the device also still cause a hang?
>>
>> Yes - it's hanging too (w/ vfio bound to the device - didn't test other
>> possibilities).
> 
> Does it happen regardless of the slot the card is plugged into?  Thanks,

Can't say - there is only one usable small pcie slot. The other slot is
blocked by the graphics card - and the third slot, which should be there
according documentation doesn't exist in reality :-(.


Regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Hartmann Oct. 25, 2014, 6:03 a.m. UTC | #9
Alex Williamson wrote:
> On Thu, 2014-10-23 at 19:33 +0200, Andreas Hartmann wrote:
>> Alex Williamson wrote:
>> [...]
>>> If you use Bjorn's previous patch to disable VC save/restore and my
>>> patch to reorder the reset mechanisms, does echo 1 > reset for the sysfs
>>> entry for the device also still cause a hang?
>>
>> Yes - it's hanging too (w/ vfio bound to the device - didn't test other
>> possibilities).
> 
> Does it happen regardless of the slot the card is plugged into?  Thanks,

As I already wrote, it's not possible to plug the device to another
port. But besides that, let me stress some "findings" I made over the
past view weeks I'm now knowing about this problem. Maybe it gives you
an idea about what's going on:


- I did all of the tests in text mode on the console. Normally, there is
a blinking cursor. When doing the echo 1 > reset, the shell doesn't come
back again and the blinking of the cursor gets immediately slower.
Getting slower means: it takes some more time until it is on / off again
again. This way, it "blinks" another not exceeding 2 times until it's
finally dead.
It looks like the machine would have suddenly extremely high load (there
are 8 cores!) - but this seems to be not true, because the cpu fan stays
silent - the rpm isn't changed at all.


- Most of the time, I'm doing tests which fail, I'm having problems
after the hang with USB (it's the Etron device). Problem means: initrd
isn't able to communicate with the device (but bios and grub2 didn't had
any problem, because keyboard worked fine, which is connected via USB
3). At this point, it is necessary to disconnect the mains completely
and wait half a minute until the problem disappears.

Seldom, I too had this problem even on bios stage: the keyboard couldn't
be seen even by the bios any more.


- Sometimes (really seldom - now happened about 3 times), it gets
extremely hard to return to normal operation after that hang. This
means: Since a few weeks, I'm running kernel 3.12.28-3-desktop out of
the box (= as provided by openSUSE). Sometimes now, I got (apparently)
the same problems (= PCIe passthrough hangs the complete machine) w/
3.12.28 as I'm having with stock >= 3.14 after testing. It's even
useless then to reconnect the mains (I experienced this 2 times in
series after one hang yesterday). At this point, I have to run kernel
3.10.x (which runs pretty fine as usual) and only after that, 3.12 works
again as expected (as appeared once yesterday while tests w/ disabled
USB 3 devices via bios).


- I think there is a relationship between how long the hang is active
and the consecutive problems coming up. If the hang is immediately (max
about 1s) reset w/ the reset knob, it is possible, that there is no USB
problem after reboot and the machine works completely fine with 3.12.x
again.


Conclusion (from my point of view):
The broken reset seems to do something really _extreme ugly_ w/ the
hardware, which has the potential to break the hardware "lasting" or the
consecutive software isn't able at all to correctly reconfigure the
system again - even after reconnecting the mains.
Fortunately I'm having an old kernel version (3.10.x), which seems to be
able to "repair" the hardware again. But I have to emphasis that the
situation is really highly questionable and I'm meanwhile fearing to
break my board finally, which is working really _extremely_ stable
besides that.



Out of interest:
Bjorn's patch disables vc save/restore support - and the machine works
fine again. Why is it needed at all if it seems to work perfectly w/o
it? What's the additional benefit? Or in other words: What am I missing
until today :-) ? What would be better? What could I do more?



Thanks,
kind regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3308,15 +3308,15 @@  static int __pci_dev_reset(struct pci_dev *dev, int prob
        if (rc != -ENOTTY)
                goto done;
 
-       rc = pci_pm_reset(dev, probe);
+       rc = pci_dev_reset_slot_function(dev, probe);
        if (rc != -ENOTTY)
                goto done;
 
-       rc = pci_dev_reset_slot_function(dev, probe);
+       rc = pci_parent_bus_reset(dev, probe);
        if (rc != -ENOTTY)
                goto done;
 
-       rc = pci_parent_bus_reset(dev, probe);
+       rc = pci_pm_reset(dev, probe);
 done:
        return rc;
 }