diff mbox series

[net] net/mlx5: fix calling mlx5_cmd_init() before DMA mask is set

Message ID 20230928-mlx5_init_fix-v1-1-79749d45ce60@linux.ibm.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] net/mlx5: fix calling mlx5_cmd_init() before DMA mask is set | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1340 this patch: 1340
netdev/cc_maintainers success CCed 10 of 10 maintainers
netdev/build_clang success Errors and warnings before: 1363 this patch: 1363
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1363 this patch: 1363
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 32 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Niklas Schnelle Sept. 28, 2023, 1:55 p.m. UTC
Since commit 06cd555f73ca ("net/mlx5: split mlx5_cmd_init() to probe and
reload routines") mlx5_cmd_init() is called in mlx5_mdev_init() which is
called in probe_one() before mlx5_pci_init(). This is a problem because
mlx5_pci_init() is where the DMA and coherent mask is set but
mlx5_cmd_init() already does a dma_alloc_coherent(). Thus a DMA
allocation is done during probe before the correct mask is set. This
causes probe to fail initialization of the cmdif SW structs on s390x
after that is converted to the common dma-iommu code. This is because on
s390x DMA addresses below 4 GiB are reserved on current machines and
unlike the old s390x specific DMA API implementation common code
enforces DMA masks. Fix this by switching the order of the
mlx5_mdev_init() and mlx5_pci_init() in probe_one().

Link: https://lore.kernel.org/linux-iommu/cfc9e9128ed5571d2e36421e347301057662a09e.camel@linux.ibm.com/
Fixes: 06cd555f73ca ("net/mlx5: split mlx5_cmd_init() to probe and reload routines")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
Note: I ran into this while testing the linked series for converting
s390x to use dma-iommu. The existing s390x specific DMA API
implementation doesn't respect DMA masks and is thus not affected
despite of course also only supporting DMA addresses above 4 GiB.
That said ConnectX VFs are the primary users of native PCI on s390x and
we'd really like to get the DMA API conversion into v6.7 so this has
high priority for us.
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)


---
base-commit: 6465e260f48790807eef06b583b38ca9789b6072
change-id: 20230928-mlx5_init_fix-c465b5cda327

Best regards,

Comments

Leon Romanovsky Sept. 28, 2023, 5:59 p.m. UTC | #1
On Thu, Sep 28, 2023 at 03:55:47PM +0200, Niklas Schnelle wrote:
> Since commit 06cd555f73ca ("net/mlx5: split mlx5_cmd_init() to probe and
> reload routines") mlx5_cmd_init() is called in mlx5_mdev_init() which is
> called in probe_one() before mlx5_pci_init(). This is a problem because
> mlx5_pci_init() is where the DMA and coherent mask is set but
> mlx5_cmd_init() already does a dma_alloc_coherent(). Thus a DMA
> allocation is done during probe before the correct mask is set. This
> causes probe to fail initialization of the cmdif SW structs on s390x
> after that is converted to the common dma-iommu code. This is because on
> s390x DMA addresses below 4 GiB are reserved on current machines and
> unlike the old s390x specific DMA API implementation common code
> enforces DMA masks. Fix this by switching the order of the
> mlx5_mdev_init() and mlx5_pci_init() in probe_one().
> 
> Link: https://lore.kernel.org/linux-iommu/cfc9e9128ed5571d2e36421e347301057662a09e.camel@linux.ibm.com/
> Fixes: 06cd555f73ca ("net/mlx5: split mlx5_cmd_init() to probe and reload routines")
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> ---
> Note: I ran into this while testing the linked series for converting
> s390x to use dma-iommu. The existing s390x specific DMA API
> implementation doesn't respect DMA masks and is thus not affected
> despite of course also only supporting DMA addresses above 4 GiB.
> That said ConnectX VFs are the primary users of native PCI on s390x and
> we'd really like to get the DMA API conversion into v6.7 so this has
> high priority for us.
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/main.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index 15561965d2af..06744dedd928 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -1908,10 +1908,6 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
>  		goto adev_init_err;
>  	}
>  
> -	err = mlx5_mdev_init(dev, prof_sel);
> -	if (err)
> -		goto mdev_init_err;
> -
>  	err = mlx5_pci_init(dev, pdev, id);
>  	if (err) {
>  		mlx5_core_err(dev, "mlx5_pci_init failed with error code %d\n",
> @@ -1919,6 +1915,10 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
>  		goto pci_init_err;
>  	}
>  
> +	err = mlx5_mdev_init(dev, prof_sel);
> +	if (err)
> +		goto mdev_init_err;
> +

I had something different in mind as I'm worry that call to pci_enable_device()
in mlx5_pci_init() before we finished FW command interface initialization is a bit
premature.

What about the following patch?

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 15561965d2af..31f1d701116a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -905,12 +905,6 @@ static int mlx5_pci_init(struct mlx5_core_dev *dev, struct pci_dev *pdev,
 
        pci_set_master(pdev);
 
-       err = set_dma_caps(pdev);
-       if (err) {
-               mlx5_core_err(dev, "Failed setting DMA capabilities mask, aborting\n");
-               goto err_clr_master;
-       }
-
        if (pci_enable_atomic_ops_to_root(pdev, PCI_EXP_DEVCAP2_ATOMIC_COMP32) &&
            pci_enable_atomic_ops_to_root(pdev, PCI_EXP_DEVCAP2_ATOMIC_COMP64) &&
            pci_enable_atomic_ops_to_root(pdev, PCI_EXP_DEVCAP2_ATOMIC_COMP128))
@@ -1908,9 +1902,15 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
                goto adev_init_err;
        }
 
+       err = set_dma_caps(pdev);
+       if (err) {
+               mlx5_core_err(dev, "Failed setting DMA capabilities mask, aborting\n");
+               goto dma_cap_err;
+       }
+
        err = mlx5_mdev_init(dev, prof_sel);
        if (err)
-               goto mdev_init_err;
+               goto dma_cap_err;
 
        err = mlx5_pci_init(dev, pdev, id);
        if (err) {
@@ -1942,7 +1942,7 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
        mlx5_pci_close(dev);
 pci_init_err:
        mlx5_mdev_uninit(dev);
-mdev_init_err:
+dma_cap_err:
        mlx5_adev_idx_free(dev->priv.adev_idx);
 adev_init_err:
        mlx5_devlink_free(devlink);

Thanks
Niklas Schnelle Sept. 29, 2023, 9:40 a.m. UTC | #2
On Thu, 2023-09-28 at 20:59 +0300, Leon Romanovsky wrote:
> On Thu, Sep 28, 2023 at 03:55:47PM +0200, Niklas Schnelle wrote:
> > Since commit 06cd555f73ca ("net/mlx5: split mlx5_cmd_init() to probe and
> > reload routines") mlx5_cmd_init() is called in mlx5_mdev_init() which is
> > called in probe_one() before mlx5_pci_init(). This is a problem because
> > mlx5_pci_init() is where the DMA and coherent mask is set but
> > mlx5_cmd_init() already does a dma_alloc_coherent(). Thus a DMA
> > allocation is done during probe before the correct mask is set. This
> > causes probe to fail initialization of the cmdif SW structs on s390x
> > after that is converted to the common dma-iommu code. This is because on
> > s390x DMA addresses below 4 GiB are reserved on current machines and
> > unlike the old s390x specific DMA API implementation common code
> > enforces DMA masks. Fix this by switching the order of the
> > mlx5_mdev_init() and mlx5_pci_init() in probe_one().
> > 
> > Link: https://lore.kernel.org/linux-iommu/cfc9e9128ed5571d2e36421e347301057662a09e.camel@linux.ibm.com/
> > Fixes: 06cd555f73ca ("net/mlx5: split mlx5_cmd_init() to probe and reload routines")
> > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> > ---
> > Note: I ran into this while testing the linked series for converting
> > s390x to use dma-iommu. The existing s390x specific DMA API
> > implementation doesn't respect DMA masks and is thus not affected
> > despite of course also only supporting DMA addresses above 4 GiB.
> > That said ConnectX VFs are the primary users of native PCI on s390x and
> > we'd really like to get the DMA API conversion into v6.7 so this has
> > high priority for us.
> > ---
> >  drivers/net/ethernet/mellanox/mlx5/core/main.c | 12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > index 15561965d2af..06744dedd928 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > @@ -1908,10 +1908,6 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
> >  		goto adev_init_err;
> >  	}
> >  
> > -	err = mlx5_mdev_init(dev, prof_sel);
> > -	if (err)
> > -		goto mdev_init_err;
> > -
> >  	err = mlx5_pci_init(dev, pdev, id);
> >  	if (err) {
> >  		mlx5_core_err(dev, "mlx5_pci_init failed with error code %d\n",
> > @@ -1919,6 +1915,10 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
> >  		goto pci_init_err;
> >  	}
> >  
> > +	err = mlx5_mdev_init(dev, prof_sel);
> > +	if (err)
> > +		goto mdev_init_err;
> > +
> 
> I had something different in mind as I'm worry that call to pci_enable_device()
> in mlx5_pci_init() before we finished FW command interface initialization is a bit
> premature.
> 
> What about the following patch?
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index 15561965d2af..31f1d701116a 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -905,12 +905,6 @@ static int mlx5_pci_init(struct mlx5_core_dev *dev, struct pci_dev *pdev,
>  
>         pci_set_master(pdev);
>  
> -       err = set_dma_caps(pdev);
> -       if (err) {
> -               mlx5_core_err(dev, "Failed setting DMA capabilities mask, aborting\n");
> -               goto err_clr_master;
> -       }
> -
>         if (pci_enable_atomic_ops_to_root(pdev, PCI_EXP_DEVCAP2_ATOMIC_COMP32) &&
>             pci_enable_atomic_ops_to_root(pdev, PCI_EXP_DEVCAP2_ATOMIC_COMP64) &&
>             pci_enable_atomic_ops_to_root(pdev, PCI_EXP_DEVCAP2_ATOMIC_COMP128))
> @@ -1908,9 +1902,15 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
>                 goto adev_init_err;
>         }
>  
> +       err = set_dma_caps(pdev);
> +       if (err) {
> +               mlx5_core_err(dev, "Failed setting DMA capabilities mask, aborting\n");
> +               goto dma_cap_err;
> +       }
> +
>         err = mlx5_mdev_init(dev, prof_sel);
>         if (err)
> -               goto mdev_init_err;
> +               goto dma_cap_err;
>  
>         err = mlx5_pci_init(dev, pdev, id);
>         if (err) {
> @@ -1942,7 +1942,7 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
>         mlx5_pci_close(dev);
>  pci_init_err:
>         mlx5_mdev_uninit(dev);
> -mdev_init_err:
> +dma_cap_err:
>         mlx5_adev_idx_free(dev->priv.adev_idx);
>  adev_init_err:
>         mlx5_devlink_free(devlink);
> 
> Thanks

The above works too. Maybe for consistency within probe_one() it would
then make sense to also rename set_dma_caps() to mlx5_dma_init()?

Thanks,
Niklas
Leon Romanovsky Sept. 29, 2023, 10:31 a.m. UTC | #3
On Fri, Sep 29, 2023 at 11:40:49AM +0200, Niklas Schnelle wrote:
> On Thu, 2023-09-28 at 20:59 +0300, Leon Romanovsky wrote:
> > On Thu, Sep 28, 2023 at 03:55:47PM +0200, Niklas Schnelle wrote:
> > > Since commit 06cd555f73ca ("net/mlx5: split mlx5_cmd_init() to probe and
> > > reload routines") mlx5_cmd_init() is called in mlx5_mdev_init() which is
> > > called in probe_one() before mlx5_pci_init(). This is a problem because
> > > mlx5_pci_init() is where the DMA and coherent mask is set but
> > > mlx5_cmd_init() already does a dma_alloc_coherent(). Thus a DMA
> > > allocation is done during probe before the correct mask is set. This
> > > causes probe to fail initialization of the cmdif SW structs on s390x
> > > after that is converted to the common dma-iommu code. This is because on
> > > s390x DMA addresses below 4 GiB are reserved on current machines and
> > > unlike the old s390x specific DMA API implementation common code
> > > enforces DMA masks. Fix this by switching the order of the
> > > mlx5_mdev_init() and mlx5_pci_init() in probe_one().
> > > 
> > > Link: https://lore.kernel.org/linux-iommu/cfc9e9128ed5571d2e36421e347301057662a09e.camel@linux.ibm.com/
> > > Fixes: 06cd555f73ca ("net/mlx5: split mlx5_cmd_init() to probe and reload routines")
> > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> > > ---
> > > Note: I ran into this while testing the linked series for converting
> > > s390x to use dma-iommu. The existing s390x specific DMA API
> > > implementation doesn't respect DMA masks and is thus not affected
> > > despite of course also only supporting DMA addresses above 4 GiB.
> > > That said ConnectX VFs are the primary users of native PCI on s390x and
> > > we'd really like to get the DMA API conversion into v6.7 so this has
> > > high priority for us.
> > > ---
> > >  drivers/net/ethernet/mellanox/mlx5/core/main.c | 12 ++++++------
> > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > index 15561965d2af..06744dedd928 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > @@ -1908,10 +1908,6 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
> > >  		goto adev_init_err;
> > >  	}
> > >  
> > > -	err = mlx5_mdev_init(dev, prof_sel);
> > > -	if (err)
> > > -		goto mdev_init_err;
> > > -
> > >  	err = mlx5_pci_init(dev, pdev, id);
> > >  	if (err) {
> > >  		mlx5_core_err(dev, "mlx5_pci_init failed with error code %d\n",
> > > @@ -1919,6 +1915,10 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
> > >  		goto pci_init_err;
> > >  	}
> > >  
> > > +	err = mlx5_mdev_init(dev, prof_sel);
> > > +	if (err)
> > > +		goto mdev_init_err;
> > > +
> > 
> > I had something different in mind as I'm worry that call to pci_enable_device()
> > in mlx5_pci_init() before we finished FW command interface initialization is a bit
> > premature.
> > 
> > What about the following patch?
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > index 15561965d2af..31f1d701116a 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > @@ -905,12 +905,6 @@ static int mlx5_pci_init(struct mlx5_core_dev *dev, struct pci_dev *pdev,
> >  
> >         pci_set_master(pdev);
> >  
> > -       err = set_dma_caps(pdev);
> > -       if (err) {
> > -               mlx5_core_err(dev, "Failed setting DMA capabilities mask, aborting\n");
> > -               goto err_clr_master;
> > -       }
> > -
> >         if (pci_enable_atomic_ops_to_root(pdev, PCI_EXP_DEVCAP2_ATOMIC_COMP32) &&
> >             pci_enable_atomic_ops_to_root(pdev, PCI_EXP_DEVCAP2_ATOMIC_COMP64) &&
> >             pci_enable_atomic_ops_to_root(pdev, PCI_EXP_DEVCAP2_ATOMIC_COMP128))
> > @@ -1908,9 +1902,15 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
> >                 goto adev_init_err;
> >         }
> >  
> > +       err = set_dma_caps(pdev);
> > +       if (err) {
> > +               mlx5_core_err(dev, "Failed setting DMA capabilities mask, aborting\n");
> > +               goto dma_cap_err;
> > +       }
> > +
> >         err = mlx5_mdev_init(dev, prof_sel);
> >         if (err)
> > -               goto mdev_init_err;
> > +               goto dma_cap_err;
> >  
> >         err = mlx5_pci_init(dev, pdev, id);
> >         if (err) {
> > @@ -1942,7 +1942,7 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
> >         mlx5_pci_close(dev);
> >  pci_init_err:
> >         mlx5_mdev_uninit(dev);
> > -mdev_init_err:
> > +dma_cap_err:
> >         mlx5_adev_idx_free(dev->priv.adev_idx);
> >  adev_init_err:
> >         mlx5_devlink_free(devlink);
> > 
> > Thanks
> 
> The above works too. Maybe for consistency within probe_one() it would
> then make sense to also rename set_dma_caps() to mlx5_dma_init()?

Sounds great, thanks

BTW, I was informed offlist that Saeed also has fix to this issue,
but I don't know if he wants to progress with that fix as it has wrong
RCA in commit message and as an outcome of that much complex solution,
which is not necessary.

So I would be happy to see your patch with mlx5_dma_init().

Thanks

> 
> Thanks,
> Niklas
Saeed Mahameed Oct. 11, 2023, 6:45 p.m. UTC | #4
On 29 Sep 13:31, Leon Romanovsky wrote:
>On Fri, Sep 29, 2023 at 11:40:49AM +0200, Niklas Schnelle wrote:
>> On Thu, 2023-09-28 at 20:59 +0300, Leon Romanovsky wrote:
>> > On Thu, Sep 28, 2023 at 03:55:47PM +0200, Niklas Schnelle wrote:
>> > > Since commit 06cd555f73ca ("net/mlx5: split mlx5_cmd_init() to probe and
>> > > reload routines") mlx5_cmd_init() is called in mlx5_mdev_init() which is
>> > > called in probe_one() before mlx5_pci_init(). This is a problem because
>> > > mlx5_pci_init() is where the DMA and coherent mask is set but
>> > > mlx5_cmd_init() already does a dma_alloc_coherent(). Thus a DMA
>> > > allocation is done during probe before the correct mask is set. This
>> > > causes probe to fail initialization of the cmdif SW structs on s390x
>> > > after that is converted to the common dma-iommu code. This is because on
>> > > s390x DMA addresses below 4 GiB are reserved on current machines and
>> > > unlike the old s390x specific DMA API implementation common code
>> > > enforces DMA masks. Fix this by switching the order of the
>> > > mlx5_mdev_init() and mlx5_pci_init() in probe_one().
>> > >
>> > > Link: https://lore.kernel.org/linux-iommu/cfc9e9128ed5571d2e36421e347301057662a09e.camel@linux.ibm.com/
>> > > Fixes: 06cd555f73ca ("net/mlx5: split mlx5_cmd_init() to probe and reload routines")
>> > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
>> > > ---
>> > > Note: I ran into this while testing the linked series for converting
>> > > s390x to use dma-iommu. The existing s390x specific DMA API
>> > > implementation doesn't respect DMA masks and is thus not affected
>> > > despite of course also only supporting DMA addresses above 4 GiB.
>> > > That said ConnectX VFs are the primary users of native PCI on s390x and
>> > > we'd really like to get the DMA API conversion into v6.7 so this has
>> > > high priority for us.
>> > > ---
>> > >  drivers/net/ethernet/mellanox/mlx5/core/main.c | 12 ++++++------
>> > >  1 file changed, 6 insertions(+), 6 deletions(-)
>> > >
>> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
>> > > index 15561965d2af..06744dedd928 100644
>> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
>> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
>> > > @@ -1908,10 +1908,6 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
>> > >  		goto adev_init_err;
>> > >  	}
>> > >
>> > > -	err = mlx5_mdev_init(dev, prof_sel);
>> > > -	if (err)
>> > > -		goto mdev_init_err;
>> > > -
>> > >  	err = mlx5_pci_init(dev, pdev, id);
>> > >  	if (err) {
>> > >  		mlx5_core_err(dev, "mlx5_pci_init failed with error code %d\n",
>> > > @@ -1919,6 +1915,10 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
>> > >  		goto pci_init_err;
>> > >  	}
>> > >
>> > > +	err = mlx5_mdev_init(dev, prof_sel);
>> > > +	if (err)
>> > > +		goto mdev_init_err;
>> > > +
>> >
>> > I had something different in mind as I'm worry that call to pci_enable_device()
>> > in mlx5_pci_init() before we finished FW command interface initialization is a bit
>> > premature.
>> >
>> > What about the following patch?
>> >
>> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
>> > index 15561965d2af..31f1d701116a 100644
>> > --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
>> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
>> > @@ -905,12 +905,6 @@ static int mlx5_pci_init(struct mlx5_core_dev *dev, struct pci_dev *pdev,
>> >
>> >         pci_set_master(pdev);
>> >
>> > -       err = set_dma_caps(pdev);
>> > -       if (err) {
>> > -               mlx5_core_err(dev, "Failed setting DMA capabilities mask, aborting\n");
>> > -               goto err_clr_master;
>> > -       }
>> > -
>> >         if (pci_enable_atomic_ops_to_root(pdev, PCI_EXP_DEVCAP2_ATOMIC_COMP32) &&
>> >             pci_enable_atomic_ops_to_root(pdev, PCI_EXP_DEVCAP2_ATOMIC_COMP64) &&
>> >             pci_enable_atomic_ops_to_root(pdev, PCI_EXP_DEVCAP2_ATOMIC_COMP128))
>> > @@ -1908,9 +1902,15 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
>> >                 goto adev_init_err;
>> >         }
>> >
>> > +       err = set_dma_caps(pdev);
>> > +       if (err) {
>> > +               mlx5_core_err(dev, "Failed setting DMA capabilities mask, aborting\n");
>> > +               goto dma_cap_err;
>> > +       }
>> > +
>> >         err = mlx5_mdev_init(dev, prof_sel);
>> >         if (err)
>> > -               goto mdev_init_err;
>> > +               goto dma_cap_err;
>> >
>> >         err = mlx5_pci_init(dev, pdev, id);
>> >         if (err) {
>> > @@ -1942,7 +1942,7 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
>> >         mlx5_pci_close(dev);
>> >  pci_init_err:
>> >         mlx5_mdev_uninit(dev);
>> > -mdev_init_err:
>> > +dma_cap_err:
>> >         mlx5_adev_idx_free(dev->priv.adev_idx);
>> >  adev_init_err:
>> >         mlx5_devlink_free(devlink);
>> >
>> > Thanks
>>
>> The above works too. Maybe for consistency within probe_one() it would
>> then make sense to also rename set_dma_caps() to mlx5_dma_init()?
>
>Sounds great, thanks
>
>BTW, I was informed offlist that Saeed also has fix to this issue,
>but I don't know if he wants to progress with that fix as it has wrong
>RCA in commit message and as an outcome of that much complex solution,
>which is not necessary.
>
>So I would be happy to see your patch with mlx5_dma_init().
>
>Thanks
>

Actually I prefer the internal patch, it moves the dma parts out of
mlx5_cmd_init() into mlx5_cmd_enable() which happens after dma caps are
set. since it is using the current mlx5 function structure and breakdown, 
I prefer it over adding new function to the driver.

I will share the patch, I will let Niklas test it and approve it before
submission.

Thanks,
Saeed.

>
Jason Gunthorpe Oct. 11, 2023, 11:10 p.m. UTC | #5
On Wed, Oct 11, 2023 at 11:45:04AM -0700, Saeed Mahameed wrote:
> > > The above works too. Maybe for consistency within probe_one() it would
> > > then make sense to also rename set_dma_caps() to mlx5_dma_init()?
> > 
> > Sounds great, thanks
> > 
> > BTW, I was informed offlist that Saeed also has fix to this issue,
> > but I don't know if he wants to progress with that fix as it has wrong
> > RCA in commit message and as an outcome of that much complex solution,
> > which is not necessary.
> > 
> > So I would be happy to see your patch with mlx5_dma_init().
> > 
> > Thanks
> > 
> 
> Actually I prefer the internal patch, it moves the dma parts out of
> mlx5_cmd_init() into mlx5_cmd_enable() which happens after dma caps are
> set. since it is using the current mlx5 function structure and breakdown, I
> prefer it over adding new function to the driver.
> 
> I will share the patch, I will let Niklas test it and approve it before
> submission.

Let's hurry please, mlx5 will be broken on S390 in rc1 if this is not
fixed soon.

Jason
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 15561965d2af..06744dedd928 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1908,10 +1908,6 @@  static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto adev_init_err;
 	}
 
-	err = mlx5_mdev_init(dev, prof_sel);
-	if (err)
-		goto mdev_init_err;
-
 	err = mlx5_pci_init(dev, pdev, id);
 	if (err) {
 		mlx5_core_err(dev, "mlx5_pci_init failed with error code %d\n",
@@ -1919,6 +1915,10 @@  static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto pci_init_err;
 	}
 
+	err = mlx5_mdev_init(dev, prof_sel);
+	if (err)
+		goto mdev_init_err;
+
 	err = mlx5_init_one(dev);
 	if (err) {
 		mlx5_core_err(dev, "mlx5_init_one failed with error code %d\n",
@@ -1939,10 +1939,10 @@  static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
 	return 0;
 
 err_init_one:
-	mlx5_pci_close(dev);
-pci_init_err:
 	mlx5_mdev_uninit(dev);
 mdev_init_err:
+	mlx5_pci_close(dev);
+pci_init_err:
 	mlx5_adev_idx_free(dev->priv.adev_idx);
 adev_init_err:
 	mlx5_devlink_free(devlink);