diff mbox

PCI: hv: Do not wait forever on a device that has disappeared

Message ID KL1P15301MB0006B36AD458D43907DF82B0BF6D0@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Dexuan Cui May 29, 2018, 7:58 p.m. UTC
> From: Michael Kelley (EOSG)
> Sent: Monday, May 28, 2018 17:19
> 
> While this patch solves the immediate problem of getting hung waiting
> for a response from Hyper-V that will never come, there's another scenario
> to look at that I think introduces a race.  Suppose the guest VM issues a
> vmbus_sendpacket() request in one of the cases covered by this patch,
> and suppose that Hyper-V queues a response to the request, and then
> immediately follows with a rescind request.   Processing the response will
> get queued to a tasklet associated with the channel, while processing the
> rescind will get queued to a tasklet associated with the top-level vmbus
> connection.   From what I can see, the code doesn't impose any ordering
> on processing the two.  If the rescind is processed first, the new
> wait_for_response() function may wake up, notice the rescind flag, and
> return an error.  Its caller will return an error, and in doing so pop the
> completion packet off the stack.   When the response is processed later,
> it will try to signal completion via a completion packet that no longer
> exists, and memory corruption will likely occur.
> 
> Am I missing anything that would prevent this scenario from happening?
> It is admittedly low probability, and a solution seems non-trivial.  I haven't
> looked specifically, but a similar scenario is probably possible with the
> drivers for other VMbus devices.  We should work on a generic solution.
> 
> Michael

Thanks for spotting the race! 

IMO we can disable the per-channel tasklet to exclude the race:


This way, when we exit the loop, we're sure hv_pci_onchannelcallback() can not 
run anymore. What do you think of this?


It looks the list of the other vmbus devices that can be hot-removed is:

the hv_utils devices
hv_sock devices
storvsc device
netvsc device

As I checked, the first 3 types of devices don't have this "send a request to the
host and wait for the response forever" pattern. NetVSC should be fixed as it has
the same pattern.

-- Dexuan

Comments

Michael Kelley (EOSG) May 31, 2018, 4:40 p.m. UTC | #1
> From: Dexuan Cui
> Sent: Tuesday, May 29, 2018 12:58 PM
> Subject: RE: [PATCH] PCI: hv: Do not wait forever on a device that has disappeared
> 
> > From: Michael Kelley (EOSG)
> > Sent: Monday, May 28, 2018 17:19
> >
> > While this patch solves the immediate problem of getting hung waiting
> > for a response from Hyper-V that will never come, there's another scenario
> > to look at that I think introduces a race.  Suppose the guest VM issues a
> > vmbus_sendpacket() request in one of the cases covered by this patch,
> > and suppose that Hyper-V queues a response to the request, and then
> > immediately follows with a rescind request.   Processing the response will
> > get queued to a tasklet associated with the channel, while processing the
> > rescind will get queued to a tasklet associated with the top-level vmbus
> > connection.   From what I can see, the code doesn't impose any ordering
> > on processing the two.  If the rescind is processed first, the new
> > wait_for_response() function may wake up, notice the rescind flag, and
> > return an error.  Its caller will return an error, and in doing so pop the
> > completion packet off the stack.   When the response is processed later,
> > it will try to signal completion via a completion packet that no longer
> > exists, and memory corruption will likely occur.
> >
> > Am I missing anything that would prevent this scenario from happening?
> > It is admittedly low probability, and a solution seems non-trivial.  I haven't
> > looked specifically, but a similar scenario is probably possible with the
> > drivers for other VMbus devices.  We should work on a generic solution.
> >
> > Michael
> 
> Thanks for spotting the race!
> 
> IMO we can disable the per-channel tasklet to exclude the race:
> 
> --- a/drivers/pci/host/pci-hyperv.c
> +++ b/drivers/pci/host/pci-hyperv.c
> @@ -565,6 +565,7 @@ static int wait_for_response(struct hv_device *hdev,
>  {
>         while (true) {
>                 if (hdev->channel->rescind) {
> +                       tasklet_disable(&hdev->channel->callback_event);
>                         dev_warn_once(&hdev->device, "The device is gone.\n");
>                         return -ENODEV;
>                 }
> 
> This way, when we exit the loop, we're sure hv_pci_onchannelcallback() can not
> run anymore. What do you think of this?

I've stared at this and the tasklet code over a couple of days now.  Adding the
call to tasklet_disable() solves the immediate problem of preventing 
hv_pci_onchannelcallback() from calling complete() against a completion packet
that has been popped off the stack.  But in doing so, it simply pushes the core
problem further down the road and leaves it unresolved.

tasklet_disable() does not prevent the tasklet from being scheduled.  So if there
is a response from Hyper-V to the original message, the tasklet still gets
scheduled.  Because it is disabled, it will sit in the tasklet queue and be skipped
over each time the queue is processed.  Later,  when the channel is eventually
deleted in free_channel(), tasklet_kill() is called.  Unfortunately, tasklet_kill()
will get stuck in an infinite loop, waiting for the tasklet to run.   There aren't
any tasklet interfaces to dequeue an already scheduled tasklet.

Please double-check my reasoning.  To solve this problem, I think the VMbus
driver code will need some additional synchronization between rescind
notifications and a response, which may or may not have been sent, and
which could be processed after the rescind.  I haven't yet thought about
what this synchronization might need to look like.

Michael

> 
> 
> It looks the list of the other vmbus devices that can be hot-removed is:
> 
> the hv_utils devices
> hv_sock devices
> storvsc device
> netvsc device
> 
> As I checked, the first 3 types of devices don't have this "send a request to the
> host and wait for the response forever" pattern. NetVSC should be fixed as it has
> the same pattern.
> 
> -- Dexuan
Dexuan Cui May 31, 2018, 9:01 p.m. UTC | #2
> From: Michael Kelley (EOSG)
> Sent: Thursday, May 31, 2018 09:41
> >
> > IMO we can disable the per-channel tasklet to exclude the race:
> > This way, when we exit the loop, we're sure hv_pci_onchannelcallback() can
> > not run anymore. What do you think of this?
> 
> I've stared at this and the tasklet code over a couple of days now.  Adding the
> call to tasklet_disable() solves the immediate problem of preventing
> hv_pci_onchannelcallback() from calling complete() against a completion
> packet
> that has been popped off the stack.  But in doing so, it simply pushes the core
> problem further down the road and leaves it unresolved.
> 
> tasklet_disable() does not prevent the tasklet from being scheduled.  So if
> there
> is a response from Hyper-V to the original message, the tasklet still gets
> scheduled.  Because it is disabled, it will sit in the tasklet queue and be
> skipped
> over each time the queue is processed.  Later,  when the channel is
> eventually
> deleted in free_channel(), tasklet_kill() is called.  Unfortunately, tasklet_kill()
> will get stuck in an infinite loop, waiting for the tasklet to run.   There aren't
> any tasklet interfaces to dequeue an already scheduled tasklet.

I think you're correct.
 
> Please double-check my reasoning.  To solve this problem, I think the VMbus
> driver code will need some additional synchronization between rescind
> notifications and a response, which may or may not have been sent, and
> which could be processed after the rescind.  I haven't yet thought about
> what this synchronization might need to look like.
> 
> Michael

Yes, it looks the VMBus driver needs to provide an API to cope with this.
I'll try to further investigate the issue.

-- Dexuan
diff mbox

Patch

--- a/drivers/pci/host/pci-hyperv.c
+++ b/drivers/pci/host/pci-hyperv.c
@@ -565,6 +565,7 @@  static int wait_for_response(struct hv_device *hdev,
 {
        while (true) {
                if (hdev->channel->rescind) {
+                       tasklet_disable(&hdev->channel->callback_event);
                        dev_warn_once(&hdev->device, "The device is gone.\n");
                        return -ENODEV;
                }