Message ID | 1396750050-7183-1-git-send-email-weiyang@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
From: Wei Yang <weiyang@linux.vnet.ibm.com> Date: Sun, 6 Apr 2014 10:07:30 +0800 > Fix issue introduced by commit: 97a5221 "net/mlx4_core: pass > pci_device_id.driver_data to __mlx4_init_one during reset". > > pci_match_id() just match the static pci_device_id, which may return NULL if > someone binds the driver to a device manually using > /sys/bus/pci/drivers/.../new_id. > > This patch wrap up a helper function __mlx4_remove_one() which does the tear > down function but preserve the drv_data. Functions like > mlx4_pci_err_detected() and mlx4_restart_one() will call this one with out > releasing drvdata. > > Tested on ConnectX-3. > > CC: Bjorn Helgaas <bhelgaas@google.com> > CC: Amir Vadai <amirv@mellanox.com> > Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Please resubmit this when the net-next tree opens back up, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/04/2014 21:43, David Miller wrote: > From: Wei Yang <weiyang@linux.vnet.ibm.com> > Date: Sun, 6 Apr 2014 10:07:30 +0800 > >> Fix issue introduced by commit: 97a5221 "net/mlx4_core: pass >> pci_device_id.driver_data to __mlx4_init_one during reset". >> >> pci_match_id() just match the static pci_device_id, which may return NULL if >> someone binds the driver to a device manually using >> /sys/bus/pci/drivers/.../new_id. >> >> This patch wrap up a helper function __mlx4_remove_one() which does the tear >> down function but preserve the drv_data. Functions like >> mlx4_pci_err_detected() and mlx4_restart_one() will call this one with out >> releasing drvdata. >> >> Tested on ConnectX-3. >> >> CC: Bjorn Helgaas <bhelgaas@google.com> >> CC: Amir Vadai <amirv@mellanox.com> >> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> > Please resubmit this when the net-next tree opens back up, thanks. Dave, this patch is for net, since it fixes an issue in the current code (actually it relates to a patch that was merged after 3.14-rc7, so it would eventually might go to -stable too). The author wasn't very familiar with the exact differences/nature of net vs. net-next, so it might created some confusion here, I'v sent him few private notes to explain how things go... Or. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/04/2014 05:07, Wei Yang wrote: > Fix issue introduced by commit: 97a5221 "net/mlx4_core: pass > pci_device_id.driver_data to __mlx4_init_one during reset". > > pci_match_id() just match the static pci_device_id, which may return NULL if > someone binds the driver to a device manually using > /sys/bus/pci/drivers/.../new_id. > > This patch wrap up a helper function __mlx4_remove_one() which does the tear > down function but preserve the drv_data. Functions like > mlx4_pci_err_detected() and mlx4_restart_one() will call this one with out > releasing drvdata. > > Tested on ConnectX-3. > > CC: Bjorn Helgaas <bhelgaas@google.com> > CC: Amir Vadai <amirv@mellanox.com> > Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> > --- > drivers/net/ethernet/mellanox/mlx4/main.c | 165 +++++++++++++++-------------- > 1 file changed, 88 insertions(+), 77 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c > index aa54ef7..9dd23e0 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/main.c > +++ b/drivers/net/ethernet/mellanox/mlx4/main.c > @@ -2268,13 +2268,8 @@ static int __mlx4_init_one(struct pci_dev *pdev, int pci_dev_data) > /* Allow large DMA segments, up to the firmware limit of 1 GB */ > dma_set_max_seg_size(&pdev->dev, 1024 * 1024 * 1024); > > - priv = kzalloc(sizeof(*priv), GFP_KERNEL); > - if (!priv) { > - err = -ENOMEM; > - goto err_release_regions; > - } > - > - dev = &priv->dev; > + dev = pci_get_drvdata(pdev); > + priv = mlx4_priv(dev); > dev->pdev = pdev; > INIT_LIST_HEAD(&priv->ctx_list); > spin_lock_init(&priv->ctx_lock); Here are some comments from Jack, please make sure to CC Jack Morgenstein <jackm@dev.mellanox.co.il> on V2 of the patch --> There is a change in behavior here, which I am concerned about. Before this patch, __mlx4_init_one() allocated a zeroed-out structure for priv, in all circumstances. Now, this structure is passed "dirty" to __mlx4_init_one from some flows (e.g., mlx4_restart_one). There may, in fact, be fields in the structure which should be "zeroed out" and are not, because previously there was no reason to be strict about this. Add a line with memset(priv, 0, sizeof(*priv)); after setting the priv variable above and also do NOT remove the following line: - priv->pci_dev_data = pci_dev_data; -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 08, 2014 at 09:48:30AM +0300, Or Gerlitz wrote: >On 07/04/2014 21:43, David Miller wrote: >>From: Wei Yang <weiyang@linux.vnet.ibm.com> >>Date: Sun, 6 Apr 2014 10:07:30 +0800 >> >>>Fix issue introduced by commit: 97a5221 "net/mlx4_core: pass >>>pci_device_id.driver_data to __mlx4_init_one during reset". >>> >>>pci_match_id() just match the static pci_device_id, which may return NULL if >>>someone binds the driver to a device manually using >>>/sys/bus/pci/drivers/.../new_id. >>> >>>This patch wrap up a helper function __mlx4_remove_one() which does the tear >>>down function but preserve the drv_data. Functions like >>>mlx4_pci_err_detected() and mlx4_restart_one() will call this one with out >>>releasing drvdata. >>> >>>Tested on ConnectX-3. >>> >>>CC: Bjorn Helgaas <bhelgaas@google.com> >>>CC: Amir Vadai <amirv@mellanox.com> >>>Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> >>Please resubmit this when the net-next tree opens back up, thanks. > >Dave, this patch is for net, since it fixes an issue in the current >code (actually it relates to a patch >that was merged after 3.14-rc7, so it would eventually might go to >-stable too). The author wasn't very >familiar with the exact differences/nature of net vs. net-next, so it >might created some confusion here, I'v >sent him few private notes to explain how things go... Yes, sorry for the confusion. Thanks Or for your patient explanation, while I still make mistakes. > >Or.
From: Or Gerlitz <ogerlitz@mellanox.com> Date: Tue, 8 Apr 2014 09:48:30 +0300 > On 07/04/2014 21:43, David Miller wrote: >> From: Wei Yang <weiyang@linux.vnet.ibm.com> >> Date: Sun, 6 Apr 2014 10:07:30 +0800 >> >>> Fix issue introduced by commit: 97a5221 "net/mlx4_core: pass >>> pci_device_id.driver_data to __mlx4_init_one during reset". >>> >>> pci_match_id() just match the static pci_device_id, which may return >>> NULL if >>> someone binds the driver to a device manually using >>> /sys/bus/pci/drivers/.../new_id. >>> >>> This patch wrap up a helper function __mlx4_remove_one() which does >>> the tear >>> down function but preserve the drv_data. Functions like >>> mlx4_pci_err_detected() and mlx4_restart_one() will call this one with >>> out >>> releasing drvdata. >>> >>> Tested on ConnectX-3. >>> >>> CC: Bjorn Helgaas <bhelgaas@google.com> >>> CC: Amir Vadai <amirv@mellanox.com> >>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> >> Please resubmit this when the net-next tree opens back up, thanks. > > Dave, this patch is for net, since it fixes an issue in the current > code (actually it relates to a patch > that was merged after 3.14-rc7, so it would eventually might go to > -stable too). The author wasn't very > familiar with the exact differences/nature of net vs. net-next, so it > might created some confusion here, I'v > sent him few private notes to explain how things go... Then please resubmit this patch with the proper subject line, th ank you. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 8, 2014 at 7:25 PM, David Miller <davem@davemloft.net> wrote: > > From: Or Gerlitz <ogerlitz@mellanox.com> > Date: Tue, 8 Apr 2014 09:48:30 +0300 > > > On 07/04/2014 21:43, David Miller wrote: > >> From: Wei Yang <weiyang@linux.vnet.ibm.com> > >> Date: Sun, 6 Apr 2014 10:07:30 +0800 > >> > >>> Fix issue introduced by commit: 97a5221 "net/mlx4_core: pass > >>> pci_device_id.driver_data to __mlx4_init_one during reset". > >>> > >>> pci_match_id() just match the static pci_device_id, which may return > >>> NULL if > >>> someone binds the driver to a device manually using > >>> /sys/bus/pci/drivers/.../new_id. > >>> > >>> This patch wrap up a helper function __mlx4_remove_one() which does > >>> the tear > >>> down function but preserve the drv_data. Functions like > >>> mlx4_pci_err_detected() and mlx4_restart_one() will call this one with > >>> out > >>> releasing drvdata. > >>> > >>> Tested on ConnectX-3. > >>> > >>> CC: Bjorn Helgaas <bhelgaas@google.com> > >>> CC: Amir Vadai <amirv@mellanox.com> > >>> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> > >> Please resubmit this when the net-next tree opens back up, thanks. > > > > Dave, this patch is for net, since it fixes an issue in the current > > code (actually it relates to a patch > > that was merged after 3.14-rc7, so it would eventually might go to > > -stable too). The author wasn't very > > familiar with the exact differences/nature of net vs. net-next, so it > > might created some confusion here, I'v > > sent him few private notes to explain how things go... > > Then please resubmit this patch with the proper subject line, th ank you. We've sent some feedback on the patch to the author along with crash net vs. net-next course, I really hope V1 he will post with the proper tag... -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index aa54ef7..9dd23e0 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -2268,13 +2268,8 @@ static int __mlx4_init_one(struct pci_dev *pdev, int pci_dev_data) /* Allow large DMA segments, up to the firmware limit of 1 GB */ dma_set_max_seg_size(&pdev->dev, 1024 * 1024 * 1024); - priv = kzalloc(sizeof(*priv), GFP_KERNEL); - if (!priv) { - err = -ENOMEM; - goto err_release_regions; - } - - dev = &priv->dev; + dev = pci_get_drvdata(pdev); + priv = mlx4_priv(dev); dev->pdev = pdev; INIT_LIST_HEAD(&priv->ctx_list); spin_lock_init(&priv->ctx_lock); @@ -2453,9 +2448,6 @@ slave_start: mlx4_sense_init(dev); mlx4_start_sense(dev); - priv->pci_dev_data = pci_dev_data; - pci_set_drvdata(pdev, dev); - return 0; err_port: @@ -2520,84 +2512,101 @@ err_disable_pdev: static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id) { + struct mlx4_priv *priv; + struct mlx4_dev *dev; + printk_once(KERN_INFO "%s", mlx4_version); + priv = kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) { + return -ENOMEM; + } + + dev = &priv->dev; + pci_set_drvdata(pdev, dev); + priv->pci_dev_data = id->driver_data; + return __mlx4_init_one(pdev, id->driver_data); } -static void mlx4_remove_one(struct pci_dev *pdev) +static void __mlx4_remove_one(struct pci_dev *pdev) { struct mlx4_dev *dev = pci_get_drvdata(pdev); struct mlx4_priv *priv = mlx4_priv(dev); int p; - if (dev) { - /* in SRIOV it is not allowed to unload the pf's - * driver while there are alive vf's */ - if (mlx4_is_master(dev)) { - if (mlx4_how_many_lives_vf(dev)) - printk(KERN_ERR "Removing PF when there are assigned VF's !!!\n"); - } - mlx4_stop_sense(dev); - mlx4_unregister_device(dev); + /* in SRIOV it is not allowed to unload the pf's + * driver while there are alive vf's */ + if (mlx4_is_master(dev)) { + if (mlx4_how_many_lives_vf(dev)) + printk(KERN_ERR "Removing PF when there are assigned VF's !!!\n"); + } + mlx4_stop_sense(dev); + mlx4_unregister_device(dev); - for (p = 1; p <= dev->caps.num_ports; p++) { - mlx4_cleanup_port_info(&priv->port[p]); - mlx4_CLOSE_PORT(dev, p); - } + for (p = 1; p <= dev->caps.num_ports; p++) { + mlx4_cleanup_port_info(&priv->port[p]); + mlx4_CLOSE_PORT(dev, p); + } - if (mlx4_is_master(dev)) - mlx4_free_resource_tracker(dev, - RES_TR_FREE_SLAVES_ONLY); - - mlx4_cleanup_counters_table(dev); - mlx4_cleanup_qp_table(dev); - mlx4_cleanup_srq_table(dev); - mlx4_cleanup_cq_table(dev); - mlx4_cmd_use_polling(dev); - mlx4_cleanup_eq_table(dev); - mlx4_cleanup_mcg_table(dev); - mlx4_cleanup_mr_table(dev); - mlx4_cleanup_xrcd_table(dev); - mlx4_cleanup_pd_table(dev); + if (mlx4_is_master(dev)) + mlx4_free_resource_tracker(dev, + RES_TR_FREE_SLAVES_ONLY); - if (mlx4_is_master(dev)) - mlx4_free_resource_tracker(dev, - RES_TR_FREE_STRUCTS_ONLY); - - iounmap(priv->kar); - mlx4_uar_free(dev, &priv->driver_uar); - mlx4_cleanup_uar_table(dev); - if (!mlx4_is_slave(dev)) - mlx4_clear_steering(dev); - mlx4_free_eq_table(dev); - if (mlx4_is_master(dev)) - mlx4_multi_func_cleanup(dev); - mlx4_close_hca(dev); - if (mlx4_is_slave(dev)) - mlx4_multi_func_cleanup(dev); - mlx4_cmd_cleanup(dev); - - if (dev->flags & MLX4_FLAG_MSI_X) - pci_disable_msix(pdev); - if (dev->flags & MLX4_FLAG_SRIOV) { - mlx4_warn(dev, "Disabling SR-IOV\n"); - pci_disable_sriov(pdev); - } + mlx4_cleanup_counters_table(dev); + mlx4_cleanup_qp_table(dev); + mlx4_cleanup_srq_table(dev); + mlx4_cleanup_cq_table(dev); + mlx4_cmd_use_polling(dev); + mlx4_cleanup_eq_table(dev); + mlx4_cleanup_mcg_table(dev); + mlx4_cleanup_mr_table(dev); + mlx4_cleanup_xrcd_table(dev); + mlx4_cleanup_pd_table(dev); - if (!mlx4_is_slave(dev)) - mlx4_free_ownership(dev); + if (mlx4_is_master(dev)) + mlx4_free_resource_tracker(dev, + RES_TR_FREE_STRUCTS_ONLY); - kfree(dev->caps.qp0_tunnel); - kfree(dev->caps.qp0_proxy); - kfree(dev->caps.qp1_tunnel); - kfree(dev->caps.qp1_proxy); + iounmap(priv->kar); + mlx4_uar_free(dev, &priv->driver_uar); + mlx4_cleanup_uar_table(dev); + if (!mlx4_is_slave(dev)) + mlx4_clear_steering(dev); + mlx4_free_eq_table(dev); + if (mlx4_is_master(dev)) + mlx4_multi_func_cleanup(dev); + mlx4_close_hca(dev); + if (mlx4_is_slave(dev)) + mlx4_multi_func_cleanup(dev); + mlx4_cmd_cleanup(dev); - kfree(priv); - pci_release_regions(pdev); - pci_disable_device(pdev); - pci_set_drvdata(pdev, NULL); + if (dev->flags & MLX4_FLAG_MSI_X) + pci_disable_msix(pdev); + if (dev->flags & MLX4_FLAG_SRIOV) { + mlx4_warn(dev, "Disabling SR-IOV\n"); + pci_disable_sriov(pdev); } + + if (!mlx4_is_slave(dev)) + mlx4_free_ownership(dev); + + kfree(dev->caps.qp0_tunnel); + kfree(dev->caps.qp0_proxy); + kfree(dev->caps.qp1_tunnel); + kfree(dev->caps.qp1_proxy); + pci_release_regions(pdev); + pci_disable_device(pdev); +} + +static void mlx4_remove_one(struct pci_dev *pdev) +{ + struct mlx4_dev *dev = pci_get_drvdata(pdev); + struct mlx4_priv *priv = mlx4_priv(dev); + + __mlx4_remove_one(pdev); + kfree(priv); + pci_set_drvdata(pdev, NULL); } int mlx4_restart_one(struct pci_dev *pdev) @@ -2607,7 +2616,7 @@ int mlx4_restart_one(struct pci_dev *pdev) int pci_dev_data; pci_dev_data = priv->pci_dev_data; - mlx4_remove_one(pdev); + __mlx4_remove_one(pdev); return __mlx4_init_one(pdev, pci_dev_data); } @@ -2662,7 +2671,7 @@ MODULE_DEVICE_TABLE(pci, mlx4_pci_table); static pci_ers_result_t mlx4_pci_err_detected(struct pci_dev *pdev, pci_channel_state_t state) { - mlx4_remove_one(pdev); + __mlx4_remove_one(pdev); return state == pci_channel_io_perm_failure ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_NEED_RESET; @@ -2670,11 +2679,13 @@ static pci_ers_result_t mlx4_pci_err_detected(struct pci_dev *pdev, static pci_ers_result_t mlx4_pci_slot_reset(struct pci_dev *pdev) { - const struct pci_device_id *id; - int ret; + struct mlx4_dev *dev = pci_get_drvdata(pdev); + struct mlx4_priv *priv = mlx4_priv(dev); + int pci_dev_data; + int ret; - id = pci_match_id(mlx4_pci_table, pdev); - ret = __mlx4_init_one(pdev, id->driver_data); + pci_dev_data = priv->pci_dev_data; + ret = __mlx4_init_one(pdev, pci_dev_data); return ret ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED; }
Fix issue introduced by commit: 97a5221 "net/mlx4_core: pass pci_device_id.driver_data to __mlx4_init_one during reset". pci_match_id() just match the static pci_device_id, which may return NULL if someone binds the driver to a device manually using /sys/bus/pci/drivers/.../new_id. This patch wrap up a helper function __mlx4_remove_one() which does the tear down function but preserve the drv_data. Functions like mlx4_pci_err_detected() and mlx4_restart_one() will call this one with out releasing drvdata. Tested on ConnectX-3. CC: Bjorn Helgaas <bhelgaas@google.com> CC: Amir Vadai <amirv@mellanox.com> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> --- drivers/net/ethernet/mellanox/mlx4/main.c | 165 +++++++++++++++-------------- 1 file changed, 88 insertions(+), 77 deletions(-)