diff mbox series

[v2] iommu/arm-smmu: Defer probe of clients after smmu device bound

Message ID 20241004090428.2035-1-quic_pbrahma@quicinc.com (mailing list archive)
State New, archived
Headers show
Series [v2] iommu/arm-smmu: Defer probe of clients after smmu device bound | expand

Commit Message

Pratyush Brahma Oct. 4, 2024, 9:04 a.m. UTC
Null pointer dereference occurs due to a race between smmu
driver probe and client driver probe, when of_dma_configure()
for client is called after the iommu_device_register() for smmu driver
probe has executed but before the driver_bound() for smmu driver
has been called.

Following is how the race occurs:

T1:Smmu device probe		T2: Client device probe

really_probe()
arm_smmu_device_probe()
iommu_device_register()
					really_probe()
					platform_dma_configure()
					of_dma_configure()
					of_dma_configure_id()
					of_iommu_configure()
					iommu_probe_device()
					iommu_init_device()
					arm_smmu_probe_device()
					arm_smmu_get_by_fwnode()
						driver_find_device_by_fwnode()
						driver_find_device()
						next_device()
						klist_next()
						    /* null ptr
						       assigned to smmu */
					/* null ptr dereference
					   while smmu->streamid_mask */
driver_bound()
	klist_add_tail()

When this null smmu pointer is dereferenced later in
arm_smmu_probe_device, the device crashes.

Fix this by deferring the probe of the client device
until the smmu device has bound to the arm smmu driver.

Fixes: 021bb8420d44 ("iommu/arm-smmu: Wire up generic configuration support")
Cc: stable@vger.kernel.org
Co-developed-by: Prakash Gupta <quic_guptap@quicinc.com>
Signed-off-by: Prakash Gupta <quic_guptap@quicinc.com>
Signed-off-by: Pratyush Brahma <quic_pbrahma@quicinc.com>
---
Changes in v2:
 Fix kernel test robot warning
 Add stable kernel list in cc
 Link to v1: https://lore.kernel.org/all/20241001055633.21062-1-quic_pbrahma@quicinc.com/

 drivers/iommu/arm/arm-smmu/arm-smmu.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Pratyush Brahma Oct. 15, 2024, 1:31 p.m. UTC | #1
On 10/4/2024 2:34 PM, Pratyush Brahma wrote:
> Null pointer dereference occurs due to a race between smmu
> driver probe and client driver probe, when of_dma_configure()
> for client is called after the iommu_device_register() for smmu driver
> probe has executed but before the driver_bound() for smmu driver
> has been called.
>
> Following is how the race occurs:
>
> T1:Smmu device probe		T2: Client device probe
>
> really_probe()
> arm_smmu_device_probe()
> iommu_device_register()
> 					really_probe()
> 					platform_dma_configure()
> 					of_dma_configure()
> 					of_dma_configure_id()
> 					of_iommu_configure()
> 					iommu_probe_device()
> 					iommu_init_device()
> 					arm_smmu_probe_device()
> 					arm_smmu_get_by_fwnode()
> 						driver_find_device_by_fwnode()
> 						driver_find_device()
> 						next_device()
> 						klist_next()
> 						    /* null ptr
> 						       assigned to smmu */
> 					/* null ptr dereference
> 					   while smmu->streamid_mask */
> driver_bound()
> 	klist_add_tail()
>
> When this null smmu pointer is dereferenced later in
> arm_smmu_probe_device, the device crashes.
>
> Fix this by deferring the probe of the client device
> until the smmu device has bound to the arm smmu driver.
>
> Fixes: 021bb8420d44 ("iommu/arm-smmu: Wire up generic configuration support")
> Cc: stable@vger.kernel.org
> Co-developed-by: Prakash Gupta <quic_guptap@quicinc.com>
> Signed-off-by: Prakash Gupta <quic_guptap@quicinc.com>
> Signed-off-by: Pratyush Brahma <quic_pbrahma@quicinc.com>
> ---
> Changes in v2:
>   Fix kernel test robot warning
>   Add stable kernel list in cc
>   Link to v1: https://lore.kernel.org/all/20241001055633.21062-1-quic_pbrahma@quicinc.com/
>
>   drivers/iommu/arm/arm-smmu/arm-smmu.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index 723273440c21..7c778b7eb8c8 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -1437,6 +1437,9 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>   			goto out_free;
>   	} else {
>   		smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
> +		if (!smmu)
> +			return ERR_PTR(dev_err_probe(dev, -EPROBE_DEFER,
> +						"smmu dev has not bound yet\n"));
>   	}
>   
>   	ret = -EINVAL;


Hi
Can someone please review this patch? Let me know if any further 
information is required.

Thanks
Pratyush
Robin Murphy Oct. 17, 2024, 1:54 p.m. UTC | #2
On 15/10/2024 2:31 pm, Pratyush Brahma wrote:
> 
> On 10/4/2024 2:34 PM, Pratyush Brahma wrote:
>> Null pointer dereference occurs due to a race between smmu
>> driver probe and client driver probe, when of_dma_configure()
>> for client is called after the iommu_device_register() for smmu driver
>> probe has executed but before the driver_bound() for smmu driver
>> has been called.
>>
>> Following is how the race occurs:
>>
>> T1:Smmu device probe        T2: Client device probe
>>
>> really_probe()
>> arm_smmu_device_probe()
>> iommu_device_register()
>>                     really_probe()
>>                     platform_dma_configure()
>>                     of_dma_configure()
>>                     of_dma_configure_id()
>>                     of_iommu_configure()
>>                     iommu_probe_device()
>>                     iommu_init_device()
>>                     arm_smmu_probe_device()
>>                     arm_smmu_get_by_fwnode()
>>                         driver_find_device_by_fwnode()
>>                         driver_find_device()
>>                         next_device()
>>                         klist_next()
>>                             /* null ptr
>>                                assigned to smmu */
>>                     /* null ptr dereference
>>                        while smmu->streamid_mask */
>> driver_bound()
>>     klist_add_tail()
>>
>> When this null smmu pointer is dereferenced later in
>> arm_smmu_probe_device, the device crashes.
>>
>> Fix this by deferring the probe of the client device
>> until the smmu device has bound to the arm smmu driver.
>>
>> Fixes: 021bb8420d44 ("iommu/arm-smmu: Wire up generic configuration 
>> support")
>> Cc: stable@vger.kernel.org
>> Co-developed-by: Prakash Gupta <quic_guptap@quicinc.com>
>> Signed-off-by: Prakash Gupta <quic_guptap@quicinc.com>
>> Signed-off-by: Pratyush Brahma <quic_pbrahma@quicinc.com>
>> ---
>> Changes in v2:
>>   Fix kernel test robot warning
>>   Add stable kernel list in cc
>>   Link to v1: 
>> https://lore.kernel.org/all/20241001055633.21062-1-quic_pbrahma@quicinc.com/
>>
>>   drivers/iommu/arm/arm-smmu/arm-smmu.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
>> b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>> index 723273440c21..7c778b7eb8c8 100644
>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>> @@ -1437,6 +1437,9 @@ static struct iommu_device 
>> *arm_smmu_probe_device(struct device *dev)
>>               goto out_free;
>>       } else {
>>           smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
>> +        if (!smmu)
>> +            return ERR_PTR(dev_err_probe(dev, -EPROBE_DEFER,
>> +                        "smmu dev has not bound yet\n"));
>>       }
>>       ret = -EINVAL;
> 
> 
> Hi
> Can someone please review this patch? Let me know if any further 
> information is required.

This really shouldn't be leaking into drivers... :(

Honestly, I'm now so fed up of piling on hacks around the fundamental 
mis-design here, I think it's finally time to blow everything else off 
and spend a few days figuring out the most expedient way to fix it 
properly once and for all. Watch this space...

Thanks,
Robin.
Pratyush Brahma Oct. 27, 2024, 3:36 p.m. UTC | #3
On 10/17/2024 7:24 PM, Robin Murphy wrote:
> On 15/10/2024 2:31 pm, Pratyush Brahma wrote:
>>
>> On 10/4/2024 2:34 PM, Pratyush Brahma wrote:
>>> Null pointer dereference occurs due to a race between smmu
>>> driver probe and client driver probe, when of_dma_configure()
>>> for client is called after the iommu_device_register() for smmu driver
>>> probe has executed but before the driver_bound() for smmu driver
>>> has been called.
>>>
>>> Following is how the race occurs:
>>>
>>> T1:Smmu device probe        T2: Client device probe
>>>
>>> really_probe()
>>> arm_smmu_device_probe()
>>> iommu_device_register()
>>>                     really_probe()
>>>                     platform_dma_configure()
>>>                     of_dma_configure()
>>>                     of_dma_configure_id()
>>>                     of_iommu_configure()
>>>                     iommu_probe_device()
>>>                     iommu_init_device()
>>>                     arm_smmu_probe_device()
>>>                     arm_smmu_get_by_fwnode()
>>>                         driver_find_device_by_fwnode()
>>>                         driver_find_device()
>>>                         next_device()
>>>                         klist_next()
>>>                             /* null ptr
>>>                                assigned to smmu */
>>>                     /* null ptr dereference
>>>                        while smmu->streamid_mask */
>>> driver_bound()
>>>     klist_add_tail()
>>>
>>> When this null smmu pointer is dereferenced later in
>>> arm_smmu_probe_device, the device crashes.
>>>
>>> Fix this by deferring the probe of the client device
>>> until the smmu device has bound to the arm smmu driver.
>>>
>>> Fixes: 021bb8420d44 ("iommu/arm-smmu: Wire up generic configuration 
>>> support")
>>> Cc: stable@vger.kernel.org
>>> Co-developed-by: Prakash Gupta <quic_guptap@quicinc.com>
>>> Signed-off-by: Prakash Gupta <quic_guptap@quicinc.com>
>>> Signed-off-by: Pratyush Brahma <quic_pbrahma@quicinc.com>
>>> ---
>>> Changes in v2:
>>>   Fix kernel test robot warning
>>>   Add stable kernel list in cc
>>>   Link to v1: 
>>> https://lore.kernel.org/all/20241001055633.21062-1-quic_pbrahma@quicinc.com/
>>>
>>>   drivers/iommu/arm/arm-smmu/arm-smmu.c | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
>>> b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>> index 723273440c21..7c778b7eb8c8 100644
>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>> @@ -1437,6 +1437,9 @@ static struct iommu_device 
>>> *arm_smmu_probe_device(struct device *dev)
>>>               goto out_free;
>>>       } else {
>>>           smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
>>> +        if (!smmu)
>>> +            return ERR_PTR(dev_err_probe(dev, -EPROBE_DEFER,
>>> +                        "smmu dev has not bound yet\n"));
>>>       }
>>>       ret = -EINVAL;
>>
>>
>> Hi
>> Can someone please review this patch? Let me know if any further 
>> information is required.
>
> This really shouldn't be leaking into drivers... :(
>
> Honestly, I'm now so fed up of piling on hacks around the fundamental 
> mis-design here, I think it's finally time to blow everything else off 
> and spend a few days figuring out the most expedient way to fix it 
> properly once and for all. Watch this space...
>
> Thanks,
> Robin.

Hi Robin

Do you have any approaches in mind that you are currently considering or 
exploring?

Thanks
Pratyush
Will Deacon Oct. 29, 2024, 4:15 p.m. UTC | #4
On Fri, 04 Oct 2024 14:34:28 +0530, Pratyush Brahma wrote:
> Null pointer dereference occurs due to a race between smmu
> driver probe and client driver probe, when of_dma_configure()
> for client is called after the iommu_device_register() for smmu driver
> probe has executed but before the driver_bound() for smmu driver
> has been called.
> 
> Following is how the race occurs:
> 
> [...]

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/1] iommu/arm-smmu: Defer probe of clients after smmu device bound
      https://git.kernel.org/will/c/229e6ee43d2a

Cheers,
Robin Murphy Nov. 7, 2024, 3:01 p.m. UTC | #5
On 29/10/2024 4:15 pm, Will Deacon wrote:
> On Fri, 04 Oct 2024 14:34:28 +0530, Pratyush Brahma wrote:
>> Null pointer dereference occurs due to a race between smmu
>> driver probe and client driver probe, when of_dma_configure()
>> for client is called after the iommu_device_register() for smmu driver
>> probe has executed but before the driver_bound() for smmu driver
>> has been called.
>>
>> Following is how the race occurs:
>>
>> [...]
> 
> Applied to will (for-joerg/arm-smmu/updates), thanks!
> 
> [1/1] iommu/arm-smmu: Defer probe of clients after smmu device bound
>        https://git.kernel.org/will/c/229e6ee43d2a

I've finally got to the point of proving to myself that this isn't the
right fix, since once we do get __iommu_probe_device() working properly
in the correct order, iommu_device_register() then runs into the same
condition itself. Diff below should make this issue go away - I'll write
up proper patches once I've tested it a little more.

Thanks,
Robin.

----->8-----
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 737c5b882355..b7dcb1494aa4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3171,8 +3171,8 @@ static struct platform_driver arm_smmu_driver;
  static
  struct arm_smmu_device *arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode)
  {
-	struct device *dev = driver_find_device_by_fwnode(&arm_smmu_driver.driver,
-							  fwnode);
+	struct device *dev = bus_find_device_by_fwnode(&platform_bus_type, fwnode);
+
  	put_device(dev);
  	return dev ? dev_get_drvdata(dev) : NULL;
  }
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 8321962b3714..aba315aa6848 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1411,8 +1411,8 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
  static
  struct arm_smmu_device *arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode)
  {
-	struct device *dev = driver_find_device_by_fwnode(&arm_smmu_driver.driver,
-							  fwnode);
+	struct device *dev = bus_find_device_by_fwnode(&platform_bus_type, fwnode);
+
  	put_device(dev);
  	return dev ? dev_get_drvdata(dev) : NULL;
  }
@@ -2232,21 +2232,6 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
  					i, irq);
  	}
  
-	err = iommu_device_sysfs_add(&smmu->iommu, smmu->dev, NULL,
-				     "smmu.%pa", &smmu->ioaddr);
-	if (err) {
-		dev_err(dev, "Failed to register iommu in sysfs\n");
-		return err;
-	}
-
-	err = iommu_device_register(&smmu->iommu, &arm_smmu_ops,
-				    using_legacy_binding ? NULL : dev);
-	if (err) {
-		dev_err(dev, "Failed to register iommu\n");
-		iommu_device_sysfs_remove(&smmu->iommu);
-		return err;
-	}
-
  	platform_set_drvdata(pdev, smmu);
  
  	/* Check for RMRs and install bypass SMRs if any */
@@ -2255,6 +2240,18 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
  	arm_smmu_device_reset(smmu);
  	arm_smmu_test_smr_masks(smmu);
  
+	err = iommu_device_sysfs_add(&smmu->iommu, smmu->dev, NULL,
+				     "smmu.%pa", &smmu->ioaddr);
+	if (err)
+		return dev_err_probe(dev, err, "Failed to register iommu in sysfs\n");
+
+	err = iommu_device_register(&smmu->iommu, &arm_smmu_ops,
+				    using_legacy_binding ? NULL : dev);
+	if (err) {
+		iommu_device_sysfs_remove(&smmu->iommu);
+		return dev_err_probe(dev, err, "Failed to register iommu\n");
+	}
+
  	/*
  	 * We want to avoid touching dev->power.lock in fastpaths unless
  	 * it's really going to do something useful - pm_runtime_enabled()
Pratyush Brahma Nov. 19, 2024, 7:10 p.m. UTC | #6
On 11/7/2024 8:31 PM, Robin Murphy wrote:
> On 29/10/2024 4:15 pm, Will Deacon wrote:
>> On Fri, 04 Oct 2024 14:34:28 +0530, Pratyush Brahma wrote:
>>> Null pointer dereference occurs due to a race between smmu
>>> driver probe and client driver probe, when of_dma_configure()
>>> for client is called after the iommu_device_register() for smmu driver
>>> probe has executed but before the driver_bound() for smmu driver
>>> has been called.
>>>
>>> Following is how the race occurs:
>>>
>>> [...]
>>
>> Applied to will (for-joerg/arm-smmu/updates), thanks!
>>
>> [1/1] iommu/arm-smmu: Defer probe of clients after smmu device bound
>>        https://git.kernel.org/will/c/229e6ee43d2a
>
> I've finally got to the point of proving to myself that this isn't the
> right fix, since once we do get __iommu_probe_device() working properly
> in the correct order, iommu_device_register() then runs into the same
> condition itself. Diff below should make this issue go away - I'll write
> up proper patches once I've tested it a little more.
>
> Thanks,
> Robin.
>
> ----->8-----
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 737c5b882355..b7dcb1494aa4 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3171,8 +3171,8 @@ static struct platform_driver arm_smmu_driver;
>  static
>  struct arm_smmu_device *arm_smmu_get_by_fwnode(struct fwnode_handle 
> *fwnode)
>  {
> -    struct device *dev = 
> driver_find_device_by_fwnode(&arm_smmu_driver.driver,
> -                              fwnode);
> +    struct device *dev = 
> bus_find_device_by_fwnode(&platform_bus_type, fwnode);
> +      put_device(dev);
>      return dev ? dev_get_drvdata(dev) : NULL;
>  }
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
> b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index 8321962b3714..aba315aa6848 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -1411,8 +1411,8 @@ static bool arm_smmu_capable(struct device *dev, 
> enum iommu_cap cap)
>  static
>  struct arm_smmu_device *arm_smmu_get_by_fwnode(struct fwnode_handle 
> *fwnode)
>  {
> -    struct device *dev = 
> driver_find_device_by_fwnode(&arm_smmu_driver.driver,
> -                              fwnode);
> +    struct device *dev = 
> bus_find_device_by_fwnode(&platform_bus_type, fwnode);
I think it would still follow this path:

bus_find_device_by_fwnode() -> bus_find_device() -> next_device()

next_device() would always return null until the driver is bound to the 
device which
happens much later in really_probe() after the iommu_device_register() 
would be called
even as per this patch. That way the race would still occur, wouldn't it?
Can you please help me understand what I may be missing here?
Are you saying that these additional patches are required along with the 
fix I've
posted?
> +
>      put_device(dev);
>      return dev ? dev_get_drvdata(dev) : NULL;
>  }
> @@ -2232,21 +2232,6 @@ static int arm_smmu_device_probe(struct 
> platform_device *pdev)
>                      i, irq);
>      }
>
> -    err = iommu_device_sysfs_add(&smmu->iommu, smmu->dev, NULL,
> -                     "smmu.%pa", &smmu->ioaddr);
> -    if (err) {
> -        dev_err(dev, "Failed to register iommu in sysfs\n");
> -        return err;
> -    }
> -
> -    err = iommu_device_register(&smmu->iommu, &arm_smmu_ops,
> -                    using_legacy_binding ? NULL : dev);
> -    if (err) {
> -        dev_err(dev, "Failed to register iommu\n");
> -        iommu_device_sysfs_remove(&smmu->iommu);
> -        return err;
> -    }
> -
>      platform_set_drvdata(pdev, smmu);
>
>      /* Check for RMRs and install bypass SMRs if any */
> @@ -2255,6 +2240,18 @@ static int arm_smmu_device_probe(struct 
> platform_device *pdev)
>      arm_smmu_device_reset(smmu);
>      arm_smmu_test_smr_masks(smmu);
>
> +    err = iommu_device_sysfs_add(&smmu->iommu, smmu->dev, NULL,
> +                     "smmu.%pa", &smmu->ioaddr);
> +    if (err)
> +        return dev_err_probe(dev, err, "Failed to register iommu in 
> sysfs\n");
> +
> +    err = iommu_device_register(&smmu->iommu, &arm_smmu_ops,
> +                    using_legacy_binding ? NULL : dev);
> +    if (err) {
> +        iommu_device_sysfs_remove(&smmu->iommu);
> +        return dev_err_probe(dev, err, "Failed to register iommu\n");
> +    }
> +
>      /*
>       * We want to avoid touching dev->power.lock in fastpaths unless
>       * it's really going to do something useful - pm_runtime_enabled()
Robin Murphy Nov. 21, 2024, 2:49 p.m. UTC | #7
On 2024-11-19 7:10 pm, Pratyush Brahma wrote:
> 
> On 11/7/2024 8:31 PM, Robin Murphy wrote:
>> On 29/10/2024 4:15 pm, Will Deacon wrote:
>>> On Fri, 04 Oct 2024 14:34:28 +0530, Pratyush Brahma wrote:
>>>> Null pointer dereference occurs due to a race between smmu
>>>> driver probe and client driver probe, when of_dma_configure()
>>>> for client is called after the iommu_device_register() for smmu driver
>>>> probe has executed but before the driver_bound() for smmu driver
>>>> has been called.
>>>>
>>>> Following is how the race occurs:
>>>>
>>>> [...]
>>>
>>> Applied to will (for-joerg/arm-smmu/updates), thanks!
>>>
>>> [1/1] iommu/arm-smmu: Defer probe of clients after smmu device bound
>>>        https://git.kernel.org/will/c/229e6ee43d2a
>>
>> I've finally got to the point of proving to myself that this isn't the
>> right fix, since once we do get __iommu_probe_device() working properly
>> in the correct order, iommu_device_register() then runs into the same
>> condition itself. Diff below should make this issue go away - I'll write
>> up proper patches once I've tested it a little more.
>>
>> Thanks,
>> Robin.
>>
>> ----->8-----
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/ 
>> iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 737c5b882355..b7dcb1494aa4 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -3171,8 +3171,8 @@ static struct platform_driver arm_smmu_driver;
>>  static
>>  struct arm_smmu_device *arm_smmu_get_by_fwnode(struct fwnode_handle 
>> *fwnode)
>>  {
>> -    struct device *dev = 
>> driver_find_device_by_fwnode(&arm_smmu_driver.driver,
>> -                              fwnode);
>> +    struct device *dev = 
>> bus_find_device_by_fwnode(&platform_bus_type, fwnode);
>> +      put_device(dev);
>>      return dev ? dev_get_drvdata(dev) : NULL;
>>  }
>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/ 
>> arm/arm-smmu/arm-smmu.c
>> index 8321962b3714..aba315aa6848 100644
>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>> @@ -1411,8 +1411,8 @@ static bool arm_smmu_capable(struct device *dev, 
>> enum iommu_cap cap)
>>  static
>>  struct arm_smmu_device *arm_smmu_get_by_fwnode(struct fwnode_handle 
>> *fwnode)
>>  {
>> -    struct device *dev = 
>> driver_find_device_by_fwnode(&arm_smmu_driver.driver,
>> -                              fwnode);
>> +    struct device *dev = 
>> bus_find_device_by_fwnode(&platform_bus_type, fwnode);
> I think it would still follow this path:
> 
> bus_find_device_by_fwnode() -> bus_find_device() -> next_device()
> 
> next_device() would always return null until the driver is bound to the 
> device which

No, this is traversing the bus list, *not* the driver list, that's the 
whole point. The SMMU device must exist on the platform bus before the 
driver can bind, since the bus is responsible for matching the driver in 
the first place.

> happens much later in really_probe() after the iommu_device_register() 
> would be called
> even as per this patch. That way the race would still occur, wouldn't it?
> Can you please help me understand what I may be missing here?
> Are you saying that these additional patches are required along with the 
> fix I've
> posted?

I'm saying my change makes there be no race, i.e. the "if (!smmu)" case 
can never be true, and so no longer needs working around.

Thanks,
Robin.

>> +
>>      put_device(dev);
>>      return dev ? dev_get_drvdata(dev) : NULL;
>>  }
>> @@ -2232,21 +2232,6 @@ static int arm_smmu_device_probe(struct 
>> platform_device *pdev)
>>                      i, irq);
>>      }
>>
>> -    err = iommu_device_sysfs_add(&smmu->iommu, smmu->dev, NULL,
>> -                     "smmu.%pa", &smmu->ioaddr);
>> -    if (err) {
>> -        dev_err(dev, "Failed to register iommu in sysfs\n");
>> -        return err;
>> -    }
>> -
>> -    err = iommu_device_register(&smmu->iommu, &arm_smmu_ops,
>> -                    using_legacy_binding ? NULL : dev);
>> -    if (err) {
>> -        dev_err(dev, "Failed to register iommu\n");
>> -        iommu_device_sysfs_remove(&smmu->iommu);
>> -        return err;
>> -    }
>> -
>>      platform_set_drvdata(pdev, smmu);
>>
>>      /* Check for RMRs and install bypass SMRs if any */
>> @@ -2255,6 +2240,18 @@ static int arm_smmu_device_probe(struct 
>> platform_device *pdev)
>>      arm_smmu_device_reset(smmu);
>>      arm_smmu_test_smr_masks(smmu);
>>
>> +    err = iommu_device_sysfs_add(&smmu->iommu, smmu->dev, NULL,
>> +                     "smmu.%pa", &smmu->ioaddr);
>> +    if (err)
>> +        return dev_err_probe(dev, err, "Failed to register iommu in 
>> sysfs\n");
>> +
>> +    err = iommu_device_register(&smmu->iommu, &arm_smmu_ops,
>> +                    using_legacy_binding ? NULL : dev);
>> +    if (err) {
>> +        iommu_device_sysfs_remove(&smmu->iommu);
>> +        return dev_err_probe(dev, err, "Failed to register iommu\n");
>> +    }
>> +
>>      /*
>>       * We want to avoid touching dev->power.lock in fastpaths unless
>>       * it's really going to do something useful - pm_runtime_enabled()
>
Pratyush Brahma Nov. 21, 2024, 9:05 p.m. UTC | #8
On 11/21/2024 8:19 PM, Robin Murphy wrote:
> On 2024-11-19 7:10 pm, Pratyush Brahma wrote:
>>
>> On 11/7/2024 8:31 PM, Robin Murphy wrote:
>>> On 29/10/2024 4:15 pm, Will Deacon wrote:
>>>> On Fri, 04 Oct 2024 14:34:28 +0530, Pratyush Brahma wrote:
>>>>> Null pointer dereference occurs due to a race between smmu
>>>>> driver probe and client driver probe, when of_dma_configure()
>>>>> for client is called after the iommu_device_register() for smmu 
>>>>> driver
>>>>> probe has executed but before the driver_bound() for smmu driver
>>>>> has been called.
>>>>>
>>>>> Following is how the race occurs:
>>>>>
>>>>> [...]
>>>>
>>>> Applied to will (for-joerg/arm-smmu/updates), thanks!
>>>>
>>>> [1/1] iommu/arm-smmu: Defer probe of clients after smmu device bound
>>>>        https://git.kernel.org/will/c/229e6ee43d2a
>>>
>>> I've finally got to the point of proving to myself that this isn't the
>>> right fix, since once we do get __iommu_probe_device() working properly
>>> in the correct order, iommu_device_register() then runs into the same
>>> condition itself. Diff below should make this issue go away - I'll 
>>> write
>>> up proper patches once I've tested it a little more.
>>>
>>> Thanks,
>>> Robin.
>>>
>>> ----->8-----
>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/ 
>>> iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> index 737c5b882355..b7dcb1494aa4 100644
>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>>> @@ -3171,8 +3171,8 @@ static struct platform_driver arm_smmu_driver;
>>>  static
>>>  struct arm_smmu_device *arm_smmu_get_by_fwnode(struct fwnode_handle 
>>> *fwnode)
>>>  {
>>> -    struct device *dev = 
>>> driver_find_device_by_fwnode(&arm_smmu_driver.driver,
>>> -                              fwnode);
>>> +    struct device *dev = 
>>> bus_find_device_by_fwnode(&platform_bus_type, fwnode);
>>> +      put_device(dev);
>>>      return dev ? dev_get_drvdata(dev) : NULL;
>>>  }
>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/ 
>>> arm/arm-smmu/arm-smmu.c
>>> index 8321962b3714..aba315aa6848 100644
>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
>>> @@ -1411,8 +1411,8 @@ static bool arm_smmu_capable(struct device 
>>> *dev, enum iommu_cap cap)
>>>  static
>>>  struct arm_smmu_device *arm_smmu_get_by_fwnode(struct fwnode_handle 
>>> *fwnode)
>>>  {
>>> -    struct device *dev = 
>>> driver_find_device_by_fwnode(&arm_smmu_driver.driver,
>>> -                              fwnode);
>>> +    struct device *dev = 
>>> bus_find_device_by_fwnode(&platform_bus_type, fwnode);
>> I think it would still follow this path:
>>
>> bus_find_device_by_fwnode() -> bus_find_device() -> next_device()
>>
>> next_device() would always return null until the driver is bound to 
>> the device which
>
> No, this is traversing the bus list, *not* the driver list, that's the 
> whole point. The SMMU device must exist on the platform bus before the 
> driver can bind, since the bus is responsible for matching the driver 
> in the first place.
Ah I see. Thanks for the explanation. That makes sense.
>
>> happens much later in really_probe() after the 
>> iommu_device_register() would be called
>> even as per this patch. That way the race would still occur, wouldn't 
>> it?
>> Can you please help me understand what I may be missing here?
>> Are you saying that these additional patches are required along with 
>> the fix I've
>> posted?
>
> I'm saying my change makes there be no race, i.e. the "if (!smmu)" 
> case can never be true, and so no longer needs working around.
Got it. Thanks.
>
> Thanks,
> Robin.
>
>>> +
>>>      put_device(dev);
>>>      return dev ? dev_get_drvdata(dev) : NULL;
>>>  }
>>> @@ -2232,21 +2232,6 @@ static int arm_smmu_device_probe(struct 
>>> platform_device *pdev)
>>>                      i, irq);
>>>      }
>>>
>>> -    err = iommu_device_sysfs_add(&smmu->iommu, smmu->dev, NULL,
>>> -                     "smmu.%pa", &smmu->ioaddr);
>>> -    if (err) {
>>> -        dev_err(dev, "Failed to register iommu in sysfs\n");
>>> -        return err;
>>> -    }
>>> -
>>> -    err = iommu_device_register(&smmu->iommu, &arm_smmu_ops,
>>> -                    using_legacy_binding ? NULL : dev);
>>> -    if (err) {
>>> -        dev_err(dev, "Failed to register iommu\n");
>>> -        iommu_device_sysfs_remove(&smmu->iommu);
>>> -        return err;
>>> -    }
>>> -
>>>      platform_set_drvdata(pdev, smmu);
>>>
>>>      /* Check for RMRs and install bypass SMRs if any */
>>> @@ -2255,6 +2240,18 @@ static int arm_smmu_device_probe(struct 
>>> platform_device *pdev)
>>>      arm_smmu_device_reset(smmu);
>>>      arm_smmu_test_smr_masks(smmu);
>>>
>>> +    err = iommu_device_sysfs_add(&smmu->iommu, smmu->dev, NULL,
>>> +                     "smmu.%pa", &smmu->ioaddr);
>>> +    if (err)
>>> +        return dev_err_probe(dev, err, "Failed to register iommu in 
>>> sysfs\n");
>>> +
>>> +    err = iommu_device_register(&smmu->iommu, &arm_smmu_ops,
>>> +                    using_legacy_binding ? NULL : dev);
>>> +    if (err) {
>>> +        iommu_device_sysfs_remove(&smmu->iommu);
>>> +        return dev_err_probe(dev, err, "Failed to register iommu\n");
>>> +    }
>>> +
>>>      /*
>>>       * We want to avoid touching dev->power.lock in fastpaths unless
>>>       * it's really going to do something useful - pm_runtime_enabled()
>>
>
diff mbox series

Patch

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 723273440c21..7c778b7eb8c8 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1437,6 +1437,9 @@  static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 			goto out_free;
 	} else {
 		smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
+		if (!smmu)
+			return ERR_PTR(dev_err_probe(dev, -EPROBE_DEFER,
+						"smmu dev has not bound yet\n"));
 	}
 
 	ret = -EINVAL;