diff mbox series

[v2,1/1] powerpc/vnic: Extend "failover pending" window

Message ID 20201019195233.GA1282438@us.ibm.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [v2,1/1] powerpc/vnic: Extend "failover pending" window | expand

Commit Message

Sukadev Bhattiprolu Oct. 19, 2020, 7:52 p.m. UTC
>From 67f8977f636e462a1cd1eadb28edd98ef4f2b756 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Date: Thu, 10 Sep 2020 11:18:41 -0700
Subject: [PATCH 1/1] powerpc/vnic: Extend "failover pending" window

Commit 5a18e1e0c193b introduced the 'failover_pending' state to track
the "failover pending window" - where we wait for the partner to become
ready (after a transport event) before actually attempting to failover.
i.e window is between following two events:

        a. we get a transport event due to a FAILOVER

        b. later, we get CRQ_INITIALIZED indicating the partner is
           ready  at which point we schedule a FAILOVER reset.

and ->failover_pending is true during this window.

If during this window, we attempt to open (or close) a device, we pretend
that the operation succeded and let the FAILOVER reset path complete the
operation.

This is fine, except if the transport event ("a" above) occurs during the
open and after open has already checked whether a failover is pending. If
that happens, we fail the open, which can cause the boot scripts to leave
the interface down requiring administrator to manually bring up the device.

This fix "extends" the failover pending window till we are _actually_
ready to perform the failover reset (i.e until after we get the RTNL
lock). Since open() holds the RTNL lock, we can be sure that we either
finish the open or if the open() fails due to the failover pending window,
we can again pretend that open is done and let the failover complete it.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
---
Changelog [v2]:
	[Brian King] Ensure we clear failover_pending during hard reset
---
 drivers/net/ethernet/ibm/ibmvnic.c | 36 ++++++++++++++++++++++++++----
 1 file changed, 32 insertions(+), 4 deletions(-)

Comments

Jakub Kicinski Oct. 21, 2020, 1:25 a.m. UTC | #1
On Mon, 19 Oct 2020 12:52:33 -0700 Sukadev Bhattiprolu wrote:
> From 67f8977f636e462a1cd1eadb28edd98ef4f2b756 Mon Sep 17 00:00:00 2001
> From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> Date: Thu, 10 Sep 2020 11:18:41 -0700
> Subject: [PATCH 1/1] powerpc/vnic: Extend "failover pending" window

I think you copy/pasted the patch into the email client. I can't seem
to apply this patch with git pw:

Patch is empty.
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Can you try re-sending with get-send-email? Or please remove these 4
header lines from the email body.
Sukadev Bhattiprolu Oct. 24, 2020, 12:02 a.m. UTC | #2
Lijun Pan [ljp@linux.vnet.ibm.com] wrote:
>    On Oct 19, 2020, at 2:52 PM, Sukadev Bhattiprolu
>    <[1]sukadev@linux.ibm.com> wrote:
> 
>    From 67f8977f636e462a1cd1eadb28edd98ef4f2b756 Mon Sep 17 00:00:00 2001
>    From: Sukadev Bhattiprolu <[2]sukadev@linux.vnet.ibm.com>
>    Date: Thu, 10 Sep 2020 11:18:41 -0700
>    Subject: [PATCH 1/1] powerpc/vnic: Extend "failover pending" window
>    Commit 5a18e1e0c193b introduced the 'failover_pending' state to track
>    the "failover pending window" - where we wait for the partner to become
>    ready (after a transport event) before actually attempting to failover.
>    i.e window is between following two events:
>           a. we get a transport event due to a FAILOVER
>           b. later, we get CRQ_INITIALIZED indicating the partner is
>              ready  at which point we schedule a FAILOVER reset.
>    and ->failover_pending is true during this window.
>    If during this window, we attempt to open (or close) a device, we
>    pretend
>    that the operation succeded and let the FAILOVER reset path complete
>    the
>    operation.
>    This is fine, except if the transport event ("a" above) occurs during
>    the
>    open and after open has already checked whether a failover is pending.
>    If
>    that happens, we fail the open, which can cause the boot scripts to
>    leave
>    the interface down requiring administrator to manually bring up the
>    device.
>    This fix "extends" the failover pending window till we are _actually_
>    ready to perform the failover reset (i.e until after we get the RTNL
>    lock). Since open() holds the RTNL lock, we can be sure that we either
>    finish the open or if the open() fails due to the failover pending
>    window,
>    we can again pretend that open is done and let the failover complete
>    it.
>    Signed-off-by: Sukadev Bhattiprolu <[3]sukadev@linux.ibm.com>
>    ---
>    Changelog [v2]:
>    [Brian King] Ensure we clear failover_pending during hard reset
>    ---
>    drivers/net/ethernet/ibm/ibmvnic.c | 36 ++++++++++++++++++++++++++----
>    1 file changed, 32 insertions(+), 4 deletions(-)
>    diff --git a/drivers/net/ethernet/ibm/ibmvnic.c
>    b/drivers/net/ethernet/ibm/ibmvnic.c
>    index 1b702a43a5d0..2a0f6f6820db 100644
>    --- a/drivers/net/ethernet/ibm/ibmvnic.c
>    +++ b/drivers/net/ethernet/ibm/ibmvnic.c
>    @@ -1197,18 +1197,27 @@ static int ibmvnic_open(struct net_device
>    *netdev)
>    if (adapter->state != VNIC_CLOSED) {
>    rc = ibmvnic_login(netdev);
>    if (rc)
>    - return rc;
>    + goto out;
>    rc = init_resources(adapter);
>    if (rc) {
>    netdev_err(netdev, "failed to initialize resources\n");
>    release_resources(adapter);
>    - return rc;
>    + goto out;
>    }
>    }
>    rc = __ibmvnic_open(netdev);
>    +out:
>    + /*
>    + * If open fails due to a pending failover, set device state and
>    + * return. Device operation will be handled by reset routine.
>    + */
>    + if (rc && adapter->failover_pending) {
>    + adapter->state = VNIC_OPEN;
>    + rc = 0;
>    + }
>    return rc;
>    }
>    @@ -1931,6 +1940,13 @@ static int do_reset(struct ibmvnic_adapter
>    *adapter,
>      rwi->reset_reason);
>    rtnl_lock();
>    + /*
>    + * Now that we have the rtnl lock, clear any pending failover.
>    + * This will ensure ibmvnic_open() has either completed or will
>    + * block until failover is complete.
>    + */
>    + if (rwi->reset_reason == VNIC_RESET_FAILOVER)
>    + adapter->failover_pending = false;
>    netif_carrier_off(netdev);
>    adapter->reset_reason = rwi->reset_reason;
>    @@ -2211,6 +2227,13 @@ static void __ibmvnic_reset(struct work_struct
>    *work)
>    /* CHANGE_PARAM requestor holds rtnl_lock */
>    rc = do_change_param_reset(adapter, rwi, reset_state);
>    } else if (adapter->force_reset_recovery) {
>    + /*
>    + * Since we are doing a hard reset now, clear the
>    + * failover_pending flag so we don't ignore any
>    + * future MOBILITY or other resets.
>    + */
>    + adapter->failover_pending = false;
>    +
> 
>    I think it would be better to put above chunk of code to
>    do_hard_reset()
>    like you do for do_reset(),  if you really want to extend the window

I put it here because we clear the other flags like force_reset_recovery
also. I have been considering moving the check ->wait_for_reset and the
rtnl lock also into do_hard_reset(). I will queue that reorg separate
from this bug fix.

>    this way.
>    Extending the window that long may cause some resets being
>    skipped in some scenarios though I don’t know yet.

Yes hard to prove, but we have run several tests on this and seems to
be working.

>    We have already seen the migration reset being skipped in some cases.

Is that happening due to this patch or in general? If a migration occurs
while failover is pending, we should review the best way to handle that.
Will failover complete in such a case or should we punt the failover and
process migration instead? I think that should be addressed separately
because that window is smaller but is there even without this patch?

>    So my point is extending the window is kind of risky, and do we have an
>    alternative to address the "open” problem you want to solve originally?
>    For example, would it be a viable approach to only change the code in
>    ibmvnic_open() or __ibmvnic_open(), but not extend this window?

Not sure. We could try and block the open until failover is completed
but a) that could still timeout the application and b) Existing code
"pretends" that failover occurred "just after" open succeeded, so marks
the open successful and lets the failover complete the open.

Besides, we should also not assume that the failover window will be
short right?

> 
>    /* Transport event occurred during previous reset */
>    if (adapter->wait_for_reset) {
>    /* Previous was CHANGE_PARAM; caller locked */
>    @@ -2275,9 +2298,15 @@ static int ibmvnic_reset(struct ibmvnic_adapter
>    *adapter,
>    unsigned long flags;
>    int ret;
>    + /*
>    + * If failover is pending don't schedule any other reset.
>    + * Instead let the failover complete. If there is already a
>    + * a failover reset scheduled, we will detect and drop the
>    + * duplicate reset when walking the ->rwi_list below.
>    + */
>    if (adapter->state == VNIC_REMOVING ||
>       adapter->state == VNIC_REMOVED ||
>    -    adapter->failover_pending) {
>    +    (adapter->failover_pending && reason != VNIC_RESET_FAILOVER)) {
>    ret = EBUSY;
>    netdev_dbg(netdev, "Adapter removing or pending failover, skipping
>    reset\n");
>    goto err;
>    @@ -4653,7 +4682,6 @@ static void ibmvnic_handle_crq(union ibmvnic_crq
>    *crq,
>    case IBMVNIC_CRQ_INIT:
>    dev_info(dev, "Partner initialized\n");
>    adapter->from_passive_init = true;
>    - adapter->failover_pending = false;
>    if (!completion_done(&adapter->init_done)) {
>    complete(&adapter->init_done);
>    adapter->init_done_rc = -EIO;
>    --
>    2.25.4
> 
> References
> 
>    1. mailto:sukadev@linux.ibm.com
>    2. mailto:sukadev@linux.vnet.ibm.com
>    3. mailto:sukadev@linux.ibm.com
Lijun Pan Oct. 27, 2020, 10:32 p.m. UTC | #3
On 2020-10-23 19:02, Sukadev Bhattiprolu wrote:
> Lijun Pan [ljp@linux.vnet.ibm.com] wrote:
>>    On Oct 19, 2020, at 2:52 PM, Sukadev Bhattiprolu
>>    <[1]sukadev@linux.ibm.com> wrote:
>> 
>>    From 67f8977f636e462a1cd1eadb28edd98ef4f2b756 Mon Sep 17 00:00:00 
>> 2001
>>    From: Sukadev Bhattiprolu <[2]sukadev@linux.vnet.ibm.com>
>>    Date: Thu, 10 Sep 2020 11:18:41 -0700
>>    Subject: [PATCH 1/1] powerpc/vnic: Extend "failover pending" window
>>    Commit 5a18e1e0c193b introduced the 'failover_pending' state to 
>> track
>>    the "failover pending window" - where we wait for the partner to 
>> become
>>    ready (after a transport event) before actually attempting to 
>> failover.
>>    i.e window is between following two events:
>>           a. we get a transport event due to a FAILOVER
>>           b. later, we get CRQ_INITIALIZED indicating the partner is
>>              ready  at which point we schedule a FAILOVER reset.
>>    and ->failover_pending is true during this window.
>>    If during this window, we attempt to open (or close) a device, we
>>    pretend
>>    that the operation succeded and let the FAILOVER reset path 
>> complete
>>    the
>>    operation.
>>    This is fine, except if the transport event ("a" above) occurs 
>> during
>>    the
>>    open and after open has already checked whether a failover is 
>> pending.
>>    If
>>    that happens, we fail the open, which can cause the boot scripts to
>>    leave
>>    the interface down requiring administrator to manually bring up the
>>    device.
>>    This fix "extends" the failover pending window till we are 
>> _actually_
>>    ready to perform the failover reset (i.e until after we get the 
>> RTNL
>>    lock). Since open() holds the RTNL lock, we can be sure that we 
>> either
>>    finish the open or if the open() fails due to the failover pending
>>    window,
>>    we can again pretend that open is done and let the failover 
>> complete
>>    it.
>>    Signed-off-by: Sukadev Bhattiprolu <[3]sukadev@linux.ibm.com>
>>    ---
>>    Changelog [v2]:
>>    [Brian King] Ensure we clear failover_pending during hard reset
>>    ---
>>    drivers/net/ethernet/ibm/ibmvnic.c | 36 
>> ++++++++++++++++++++++++++----
>>    1 file changed, 32 insertions(+), 4 deletions(-)
>>    diff --git a/drivers/net/ethernet/ibm/ibmvnic.c
>>    b/drivers/net/ethernet/ibm/ibmvnic.c
>>    index 1b702a43a5d0..2a0f6f6820db 100644
>>    --- a/drivers/net/ethernet/ibm/ibmvnic.c
>>    +++ b/drivers/net/ethernet/ibm/ibmvnic.c
>>    @@ -1197,18 +1197,27 @@ static int ibmvnic_open(struct net_device
>>    *netdev)
>>    if (adapter->state != VNIC_CLOSED) {
>>    rc = ibmvnic_login(netdev);
>>    if (rc)
>>    - return rc;
>>    + goto out;
>>    rc = init_resources(adapter);
>>    if (rc) {
>>    netdev_err(netdev, "failed to initialize resources\n");
>>    release_resources(adapter);
>>    - return rc;
>>    + goto out;
>>    }
>>    }
>>    rc = __ibmvnic_open(netdev);
>>    +out:
>>    + /*
>>    + * If open fails due to a pending failover, set device state and
>>    + * return. Device operation will be handled by reset routine.
>>    + */
>>    + if (rc && adapter->failover_pending) {
>>    + adapter->state = VNIC_OPEN;
>>    + rc = 0;
>>    + }
>>    return rc;
>>    }
>>    @@ -1931,6 +1940,13 @@ static int do_reset(struct ibmvnic_adapter
>>    *adapter,
>>      rwi->reset_reason);
>>    rtnl_lock();
>>    + /*
>>    + * Now that we have the rtnl lock, clear any pending failover.
>>    + * This will ensure ibmvnic_open() has either completed or will
>>    + * block until failover is complete.
>>    + */
>>    + if (rwi->reset_reason == VNIC_RESET_FAILOVER)
>>    + adapter->failover_pending = false;
>>    netif_carrier_off(netdev);
>>    adapter->reset_reason = rwi->reset_reason;
>>    @@ -2211,6 +2227,13 @@ static void __ibmvnic_reset(struct 
>> work_struct
>>    *work)
>>    /* CHANGE_PARAM requestor holds rtnl_lock */
>>    rc = do_change_param_reset(adapter, rwi, reset_state);
>>    } else if (adapter->force_reset_recovery) {
>>    + /*
>>    + * Since we are doing a hard reset now, clear the
>>    + * failover_pending flag so we don't ignore any
>>    + * future MOBILITY or other resets.
>>    + */
>>    + adapter->failover_pending = false;
>>    +
>> 
>>    I think it would be better to put above chunk of code to
>>    do_hard_reset()
>>    like you do for do_reset(),  if you really want to extend the 
>> window
> 
> I put it here because we clear the other flags like 
> force_reset_recovery
> also. I have been considering moving the check ->wait_for_reset and the
> rtnl lock also into do_hard_reset(). I will queue that reorg separate
> from this bug fix.
> 
>>    this way.
>>    Extending the window that long may cause some resets being
>>    skipped in some scenarios though I don’t know yet.
> 
> Yes hard to prove, but we have run several tests on this and seems to
> be working.
> 
>>    We have already seen the migration reset being skipped in some 
>> cases.
> 
> Is that happening due to this patch or in general? If a migration 
> occurs

This patch changed the original behavior and clears failover_pending
only in do_reset which caused migration reset being skipped in the 
scenario like,
do_rest(migration reset) somehow calls into do_hard_reset(failover 
reset),
and the failover_pending is not cleared, so that do_hard_reset is 
skipped,
and migration reset is not completely completed.

This is why Brian King suggested this to fix the regression of v1 patch.
    Changelog [v2]:
    [Brian King] Ensure we clear failover_pending during hard reset

> while failover is pending, we should review the best way to handle 
> that.
> Will failover complete in such a case or should we punt the failover 
> and
> process migration instead? I think that should be addressed separately
> because that window is smaller but is there even without this patch?
> 
>>    So my point is extending the window is kind of risky, and do we 
>> have an
>>    alternative to address the "open” problem you want to solve 
>> originally?
>>    For example, would it be a viable approach to only change the code 
>> in
>>    ibmvnic_open() or __ibmvnic_open(), but not extend this window?
> 
> Not sure. We could try and block the open until failover is completed
> but a) that could still timeout the application and b) Existing code
> "pretends" that failover occurred "just after" open succeeded, so marks
> the open successful and lets the failover complete the open.
> 
> Besides, we should also not assume that the failover window will be
> short right?
> 
>> 
>>    /* Transport event occurred during previous reset */
>>    if (adapter->wait_for_reset) {
>>    /* Previous was CHANGE_PARAM; caller locked */
>>    @@ -2275,9 +2298,15 @@ static int ibmvnic_reset(struct 
>> ibmvnic_adapter
>>    *adapter,
>>    unsigned long flags;
>>    int ret;
>>    + /*
>>    + * If failover is pending don't schedule any other reset.
>>    + * Instead let the failover complete. If there is already a
>>    + * a failover reset scheduled, we will detect and drop the
>>    + * duplicate reset when walking the ->rwi_list below.
>>    + */
>>    if (adapter->state == VNIC_REMOVING ||
>>       adapter->state == VNIC_REMOVED ||
>>    -    adapter->failover_pending) {
>>    +    (adapter->failover_pending && reason != VNIC_RESET_FAILOVER)) 
>> {
>>    ret = EBUSY;
>>    netdev_dbg(netdev, "Adapter removing or pending failover, skipping
>>    reset\n");
>>    goto err;
>>    @@ -4653,7 +4682,6 @@ static void ibmvnic_handle_crq(union 
>> ibmvnic_crq
>>    *crq,
>>    case IBMVNIC_CRQ_INIT:
>>    dev_info(dev, "Partner initialized\n");
>>    adapter->from_passive_init = true;
>>    - adapter->failover_pending = false;
>>    if (!completion_done(&adapter->init_done)) {
>>    complete(&adapter->init_done);
>>    adapter->init_done_rc = -EIO;
>>    --
>>    2.25.4
>> 
>> References
>> 
>>    1. mailto:sukadev@linux.ibm.com
>>    2. mailto:sukadev@linux.vnet.ibm.com
>>    3. mailto:sukadev@linux.ibm.com
Lijun Pan Oct. 30, 2020, 4 a.m. UTC | #4
On 2020-10-23 19:02, Sukadev Bhattiprolu wrote:
> Lijun Pan [ljp@linux.vnet.ibm.com] wrote:
>>    On Oct 19, 2020, at 2:52 PM, Sukadev Bhattiprolu
>>    <[1]sukadev@linux.ibm.com> wrote:
>> 
>>    From 67f8977f636e462a1cd1eadb28edd98ef4f2b756 Mon Sep 17 00:00:00 
>> 2001
>>    From: Sukadev Bhattiprolu <[2]sukadev@linux.vnet.ibm.com>
>>    Date: Thu, 10 Sep 2020 11:18:41 -0700
>>    Subject: [PATCH 1/1] powerpc/vnic: Extend "failover pending" window
>>    Commit 5a18e1e0c193b introduced the 'failover_pending' state to 
>> track
>>    the "failover pending window" - where we wait for the partner to 
>> become
>>    ready (after a transport event) before actually attempting to 
>> failover.
>>    i.e window is between following two events:
>>           a. we get a transport event due to a FAILOVER
>>           b. later, we get CRQ_INITIALIZED indicating the partner is
>>              ready  at which point we schedule a FAILOVER reset.
>>    and ->failover_pending is true during this window.
>>    If during this window, we attempt to open (or close) a device, we
>>    pretend
>>    that the operation succeded and let the FAILOVER reset path 
>> complete
>>    the
>>    operation.
>>    This is fine, except if the transport event ("a" above) occurs 
>> during
>>    the
>>    open and after open has already checked whether a failover is 
>> pending.
>>    If
>>    that happens, we fail the open, which can cause the boot scripts to
>>    leave
>>    the interface down requiring administrator to manually bring up the
>>    device.
>>    This fix "extends" the failover pending window till we are 
>> _actually_
>>    ready to perform the failover reset (i.e until after we get the 
>> RTNL
>>    lock). Since open() holds the RTNL lock, we can be sure that we 
>> either
>>    finish the open or if the open() fails due to the failover pending
>>    window,
>>    we can again pretend that open is done and let the failover 
>> complete
>>    it.
>>    Signed-off-by: Sukadev Bhattiprolu <[3]sukadev@linux.ibm.com>
>>    ---
>>    Changelog [v2]:
>>    [Brian King] Ensure we clear failover_pending during hard reset
>>    ---
>>    drivers/net/ethernet/ibm/ibmvnic.c | 36 
>> ++++++++++++++++++++++++++----
>>    1 file changed, 32 insertions(+), 4 deletions(-)
>>    diff --git a/drivers/net/ethernet/ibm/ibmvnic.c
>>    b/drivers/net/ethernet/ibm/ibmvnic.c
>>    index 1b702a43a5d0..2a0f6f6820db 100644
>>    --- a/drivers/net/ethernet/ibm/ibmvnic.c
>>    +++ b/drivers/net/ethernet/ibm/ibmvnic.c
>>    @@ -1197,18 +1197,27 @@ static int ibmvnic_open(struct net_device
>>    *netdev)
>>    if (adapter->state != VNIC_CLOSED) {
>>    rc = ibmvnic_login(netdev);
>>    if (rc)
>>    - return rc;
>>    + goto out;
>>    rc = init_resources(adapter);
>>    if (rc) {
>>    netdev_err(netdev, "failed to initialize resources\n");
>>    release_resources(adapter);
>>    - return rc;
>>    + goto out;
>>    }
>>    }
>>    rc = __ibmvnic_open(netdev);
>>    +out:
>>    + /*
>>    + * If open fails due to a pending failover, set device state and
>>    + * return. Device operation will be handled by reset routine.
>>    + */
>>    + if (rc && adapter->failover_pending) {
>>    + adapter->state = VNIC_OPEN;
>>    + rc = 0;
>>    + }
>>    return rc;
>>    }
>>    @@ -1931,6 +1940,13 @@ static int do_reset(struct ibmvnic_adapter
>>    *adapter,
>>      rwi->reset_reason);
>>    rtnl_lock();
>>    + /*
>>    + * Now that we have the rtnl lock, clear any pending failover.
>>    + * This will ensure ibmvnic_open() has either completed or will
>>    + * block until failover is complete.
>>    + */
>>    + if (rwi->reset_reason == VNIC_RESET_FAILOVER)
>>    + adapter->failover_pending = false;
>>    netif_carrier_off(netdev);
>>    adapter->reset_reason = rwi->reset_reason;
>>    @@ -2211,6 +2227,13 @@ static void __ibmvnic_reset(struct 
>> work_struct
>>    *work)
>>    /* CHANGE_PARAM requestor holds rtnl_lock */
>>    rc = do_change_param_reset(adapter, rwi, reset_state);
>>    } else if (adapter->force_reset_recovery) {
>>    + /*
>>    + * Since we are doing a hard reset now, clear the
>>    + * failover_pending flag so we don't ignore any
>>    + * future MOBILITY or other resets.
>>    + */
>>    + adapter->failover_pending = false;
>>    +
>> 
>>    I think it would be better to put above chunk of code to
>>    do_hard_reset()
>>    like you do for do_reset(),  if you really want to extend the 
>> window
> 
> I put it here because we clear the other flags like 
> force_reset_recovery
> also. I have been considering moving the check ->wait_for_reset and the
> rtnl lock also into do_hard_reset(). I will queue that reorg separate
> from this bug fix.

Since you also have the idea of moving all of them to do_hard_reset in 
the future,
why not just put the adapter->failover_pending=false to do_hard_reset 
now?

> 
>>    this way.
>>    Extending the window that long may cause some resets being
>>    skipped in some scenarios though I don’t know yet.
> 
> Yes hard to prove, but we have run several tests on this and seems to
> be working.
> 
>>    We have already seen the migration reset being skipped in some 
>> cases.
> 
> Is that happening due to this patch or in general? If a migration 
> occurs
> while failover is pending, we should review the best way to handle 
> that.
> Will failover complete in such a case or should we punt the failover 
> and
> process migration instead? I think that should be addressed separately
> because that window is smaller but is there even without this patch?
> 
>>    So my point is extending the window is kind of risky, and do we 
>> have an
>>    alternative to address the "open” problem you want to solve 
>> originally?
>>    For example, would it be a viable approach to only change the code 
>> in
>>    ibmvnic_open() or __ibmvnic_open(), but not extend this window?
> 
> Not sure. We could try and block the open until failover is completed
> but a) that could still timeout the application and b) Existing code
> "pretends" that failover occurred "just after" open succeeded, so marks
> the open successful and lets the failover complete the open.
> 

Please explain it in the commit message in v3 such that it is easier to 
track
the history by future contributors; this version has format issue 
anyway.

Other than that, I am fine with the current approach.

Reviewed-by: Lijun Pan <ljp@linux.ibm.com>

> Besides, we should also not assume that the failover window will be
> short right?
> 
>> 
>>    /* Transport event occurred during previous reset */
>>    if (adapter->wait_for_reset) {
>>    /* Previous was CHANGE_PARAM; caller locked */
>>    @@ -2275,9 +2298,15 @@ static int ibmvnic_reset(struct 
>> ibmvnic_adapter
>>    *adapter,
>>    unsigned long flags;
>>    int ret;
>>    + /*
>>    + * If failover is pending don't schedule any other reset.
>>    + * Instead let the failover complete. If there is already a
>>    + * a failover reset scheduled, we will detect and drop the
>>    + * duplicate reset when walking the ->rwi_list below.
>>    + */
>>    if (adapter->state == VNIC_REMOVING ||
>>       adapter->state == VNIC_REMOVED ||
>>    -    adapter->failover_pending) {
>>    +    (adapter->failover_pending && reason != VNIC_RESET_FAILOVER)) 
>> {
>>    ret = EBUSY;
>>    netdev_dbg(netdev, "Adapter removing or pending failover, skipping
>>    reset\n");
>>    goto err;
>>    @@ -4653,7 +4682,6 @@ static void ibmvnic_handle_crq(union 
>> ibmvnic_crq
>>    *crq,
>>    case IBMVNIC_CRQ_INIT:
>>    dev_info(dev, "Partner initialized\n");
>>    adapter->from_passive_init = true;
>>    - adapter->failover_pending = false;
>>    if (!completion_done(&adapter->init_done)) {
>>    complete(&adapter->init_done);
>>    adapter->init_done_rc = -EIO;
>>    --
>>    2.25.4
>> 
>> References
>> 
>>    1. mailto:sukadev@linux.ibm.com
>>    2. mailto:sukadev@linux.vnet.ibm.com
>>    3. mailto:sukadev@linux.ibm.com
diff mbox series

Patch

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 1b702a43a5d0..2a0f6f6820db 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1197,18 +1197,27 @@  static int ibmvnic_open(struct net_device *netdev)
 	if (adapter->state != VNIC_CLOSED) {
 		rc = ibmvnic_login(netdev);
 		if (rc)
-			return rc;
+			goto out;
 
 		rc = init_resources(adapter);
 		if (rc) {
 			netdev_err(netdev, "failed to initialize resources\n");
 			release_resources(adapter);
-			return rc;
+			goto out;
 		}
 	}
 
 	rc = __ibmvnic_open(netdev);
 
+out:
+	/*
+	 * If open fails due to a pending failover, set device state and
+	 * return. Device operation will be handled by reset routine.
+	 */
+	if (rc && adapter->failover_pending) {
+		adapter->state = VNIC_OPEN;
+		rc = 0;
+	}
 	return rc;
 }
 
@@ -1931,6 +1940,13 @@  static int do_reset(struct ibmvnic_adapter *adapter,
 		   rwi->reset_reason);
 
 	rtnl_lock();
+	/*
+	 * Now that we have the rtnl lock, clear any pending failover.
+	 * This will ensure ibmvnic_open() has either completed or will
+	 * block until failover is complete.
+	 */
+	if (rwi->reset_reason == VNIC_RESET_FAILOVER)
+		adapter->failover_pending = false;
 
 	netif_carrier_off(netdev);
 	adapter->reset_reason = rwi->reset_reason;
@@ -2211,6 +2227,13 @@  static void __ibmvnic_reset(struct work_struct *work)
 			/* CHANGE_PARAM requestor holds rtnl_lock */
 			rc = do_change_param_reset(adapter, rwi, reset_state);
 		} else if (adapter->force_reset_recovery) {
+			/*
+			 * Since we are doing a hard reset now, clear the
+			 * failover_pending flag so we don't ignore any
+			 * future MOBILITY or other resets.
+			 */
+			adapter->failover_pending = false;
+
 			/* Transport event occurred during previous reset */
 			if (adapter->wait_for_reset) {
 				/* Previous was CHANGE_PARAM; caller locked */
@@ -2275,9 +2298,15 @@  static int ibmvnic_reset(struct ibmvnic_adapter *adapter,
 	unsigned long flags;
 	int ret;
 
+	/*
+	 * If failover is pending don't schedule any other reset.
+	 * Instead let the failover complete. If there is already a
+	 * a failover reset scheduled, we will detect and drop the
+	 * duplicate reset when walking the ->rwi_list below.
+	 */
 	if (adapter->state == VNIC_REMOVING ||
 	    adapter->state == VNIC_REMOVED ||
-	    adapter->failover_pending) {
+	    (adapter->failover_pending && reason != VNIC_RESET_FAILOVER)) {
 		ret = EBUSY;
 		netdev_dbg(netdev, "Adapter removing or pending failover, skipping reset\n");
 		goto err;
@@ -4653,7 +4682,6 @@  static void ibmvnic_handle_crq(union ibmvnic_crq *crq,
 		case IBMVNIC_CRQ_INIT:
 			dev_info(dev, "Partner initialized\n");
 			adapter->from_passive_init = true;
-			adapter->failover_pending = false;
 			if (!completion_done(&adapter->init_done)) {
 				complete(&adapter->init_done);
 				adapter->init_done_rc = -EIO;