diff mbox series

[net,v2] qed: rdma - don't wait for resources under hw error recovery flow

Message ID 20210922073631.31626-1-smalin@marvell.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] qed: rdma - don't wait for resources under hw error recovery flow | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers fail 2 blamed authors not CCed: michal.kalderon@cavium.com tomer.tayar@cavium.com; 3 maintainers not CCed: michal.kalderon@cavium.com GR-everest-linux-l2@marvell.com tomer.tayar@cavium.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 28 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link

Commit Message

Shai Malin Sept. 22, 2021, 7:36 a.m. UTC
If the HW device is during recovery, the HW resources will never return,
hence we shouldn't wait for the CID (HW context ID) bitmaps to clear.
This fix speeds up the error recovery flow.

Changes since v1:
- Fix race condition (thanks to Leon Romanovsky).

Fixes: 64515dc899df ("qed: Add infrastructure for error detection and recovery")
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
---
 drivers/net/ethernet/qlogic/qed/qed_iwarp.c | 8 ++++++++
 drivers/net/ethernet/qlogic/qed/qed_roce.c  | 8 ++++++++
 2 files changed, 16 insertions(+)

Comments

Leon Romanovsky Sept. 22, 2021, 9:40 a.m. UTC | #1
On Wed, Sep 22, 2021 at 10:36:31AM +0300, Shai Malin wrote:
> If the HW device is during recovery, the HW resources will never return,
> hence we shouldn't wait for the CID (HW context ID) bitmaps to clear.
> This fix speeds up the error recovery flow.
> 
> Changes since v1:
> - Fix race condition (thanks to Leon Romanovsky).

Please put changelog under "---", there is a little value for them in the
commit message.

> 
> Fixes: 64515dc899df ("qed: Add infrastructure for error detection and recovery")
> Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
> Signed-off-by: Ariel Elior <aelior@marvell.com>
> Signed-off-by: Shai Malin <smalin@marvell.com>
> ---
>  drivers/net/ethernet/qlogic/qed/qed_iwarp.c | 8 ++++++++
>  drivers/net/ethernet/qlogic/qed/qed_roce.c  | 8 ++++++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
> index fc8b3e64f153..186d0048a9d1 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
> @@ -1297,6 +1297,14 @@ qed_iwarp_wait_cid_map_cleared(struct qed_hwfn *p_hwfn, struct qed_bmap *bmap)
>  	prev_weight = weight;
>  
>  	while (weight) {
> +		/* If the HW device is during recovery, all resources are
> +		 * immediately reset without receiving a per-cid indication
> +		 * from HW. In this case we don't expect the cid_map to be
> +		 * cleared.
> +		 */
> +		if (p_hwfn->cdev->recov_in_prog)
> +			return 0;
> +
>  		msleep(QED_IWARP_MAX_CID_CLEAN_TIME);
>  
>  		weight = bitmap_weight(bmap->bitmap, bmap->max_count);
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c b/drivers/net/ethernet/qlogic/qed/qed_roce.c
> index f16a157bb95a..cf5baa5e59bc 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
> @@ -77,6 +77,14 @@ void qed_roce_stop(struct qed_hwfn *p_hwfn)
>  	 * Beyond the added delay we clear the bitmap anyway.
>  	 */
>  	while (bitmap_weight(rcid_map->bitmap, rcid_map->max_count)) {
> +		/* If the HW device is during recovery, all resources are
> +		 * immediately reset without receiving a per-cid indication
> +		 * from HW. In this case we don't expect the cid bitmap to be
> +		 * cleared.
> +		 */
> +		if (p_hwfn->cdev->recov_in_prog)
> +			return;
> +
>  		msleep(100);
>  		if (wait_count++ > 20) {
>  			DP_NOTICE(p_hwfn, "cid bitmap wait timed out\n");
> -- 
> 2.27.0
>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
index fc8b3e64f153..186d0048a9d1 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
@@ -1297,6 +1297,14 @@  qed_iwarp_wait_cid_map_cleared(struct qed_hwfn *p_hwfn, struct qed_bmap *bmap)
 	prev_weight = weight;
 
 	while (weight) {
+		/* If the HW device is during recovery, all resources are
+		 * immediately reset without receiving a per-cid indication
+		 * from HW. In this case we don't expect the cid_map to be
+		 * cleared.
+		 */
+		if (p_hwfn->cdev->recov_in_prog)
+			return 0;
+
 		msleep(QED_IWARP_MAX_CID_CLEAN_TIME);
 
 		weight = bitmap_weight(bmap->bitmap, bmap->max_count);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c b/drivers/net/ethernet/qlogic/qed/qed_roce.c
index f16a157bb95a..cf5baa5e59bc 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
@@ -77,6 +77,14 @@  void qed_roce_stop(struct qed_hwfn *p_hwfn)
 	 * Beyond the added delay we clear the bitmap anyway.
 	 */
 	while (bitmap_weight(rcid_map->bitmap, rcid_map->max_count)) {
+		/* If the HW device is during recovery, all resources are
+		 * immediately reset without receiving a per-cid indication
+		 * from HW. In this case we don't expect the cid bitmap to be
+		 * cleared.
+		 */
+		if (p_hwfn->cdev->recov_in_prog)
+			return;
+
 		msleep(100);
 		if (wait_count++ > 20) {
 			DP_NOTICE(p_hwfn, "cid bitmap wait timed out\n");