Message ID | 894db0ccae854b35c73814485569b634237b5538.1657034828.git.robin.murphy@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | iommu: Retire bus_set_iommu() | expand |
On 2022/7/6 01:08, Robin Murphy wrote: > Currently we rely on registering all our instances before initially > allowing any .probe_device calls via bus_set_iommu(). In preparation for > phasing out the latter, make sure we won't inadvertently return success > for a device associated with a known but not yet registered instance, > otherwise we'll run straight into iommu_group_get_for_dev() trying to > use NULL ops. > > That also highlights an issue with intel_iommu_get_resv_regions() taking > dmar_global_lock from within a section where intel_iommu_init() already > holds it, which already exists via probe_acpi_namespace_devices() when > an ANDD device is probed, but gets more obvious with the upcoming change > to iommu_device_register(). Since they are both read locks it manages > not to deadlock in practice, so I'm leaving it here for someone with > more confidence to tackle a larger rework of the locking. Thanks for highlighting this. I will look into it later. Best regards, baolu > > Signed-off-by: Robin Murphy <robin.murphy@arm.com> > --- > > v3: New > > drivers/iommu/intel/iommu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index 44016594831d..3e02c08802a0 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -4600,7 +4600,7 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev) > u8 bus, devfn; > > iommu = device_to_iommu(dev, &bus, &devfn); > - if (!iommu) > + if (!iommu || !iommu->iommu.ops) > return ERR_PTR(-ENODEV); > > info = kzalloc(sizeof(*info), GFP_KERNEL);
> From: Robin Murphy <robin.murphy@arm.com> > Sent: Wednesday, July 6, 2022 1:08 AM > > Currently we rely on registering all our instances before initially > allowing any .probe_device calls via bus_set_iommu(). In preparation for > phasing out the latter, make sure we won't inadvertently return success > for a device associated with a known but not yet registered instance, > otherwise we'll run straight into iommu_group_get_for_dev() trying to > use NULL ops. > > That also highlights an issue with intel_iommu_get_resv_regions() taking > dmar_global_lock from within a section where intel_iommu_init() already > holds it, which already exists via probe_acpi_namespace_devices() when > an ANDD device is probed, but gets more obvious with the upcoming change > to iommu_device_register(). Since they are both read locks it manages > not to deadlock in practice, so I'm leaving it here for someone with > more confidence to tackle a larger rework of the locking. > > Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
On 2022/7/6 01:08, Robin Murphy wrote: > That also highlights an issue with intel_iommu_get_resv_regions() taking > dmar_global_lock from within a section where intel_iommu_init() already > holds it, which already exists via probe_acpi_namespace_devices() when > an ANDD device is probed, but gets more obvious with the upcoming change > to iommu_device_register(). Since they are both read locks it manages > not to deadlock in practice, so I'm leaving it here for someone with > more confidence to tackle a larger rework of the locking. I am trying to reproduce this problem. Strangely, even if I selected CONFIG_LOCKDEP=y, the kernel didn't complain anything. :-) In fact the rmrr list in the Intel IOMMU driver is always static after parsing the ACPI/DMAR tables. There's no need to protect it with a lock. Hence we can safely remove below down/up_read(). 4512 static void intel_iommu_get_resv_regions(struct device *device, 4513 struct list_head *head) 4514 { 4515 int prot = DMA_PTE_READ | DMA_PTE_WRITE; 4516 struct iommu_resv_region *reg; 4517 struct dmar_rmrr_unit *rmrr; 4518 struct device *i_dev; 4519 int i; 4520 4521 down_read(&dmar_global_lock); 4522 for_each_rmrr_units(rmrr) { 4523 for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt, 4524 i, i_dev) { Best regards, baolu
On 2022-07-08 08:52, Baolu Lu wrote: > On 2022/7/6 01:08, Robin Murphy wrote: >> That also highlights an issue with intel_iommu_get_resv_regions() taking >> dmar_global_lock from within a section where intel_iommu_init() already >> holds it, which already exists via probe_acpi_namespace_devices() when >> an ANDD device is probed, but gets more obvious with the upcoming change >> to iommu_device_register(). Since they are both read locks it manages >> not to deadlock in practice, so I'm leaving it here for someone with >> more confidence to tackle a larger rework of the locking. > > I am trying to reproduce this problem. Strangely, even if I selected > CONFIG_LOCKDEP=y, the kernel didn't complain anything. :-) FWIW, see below for the full report I get with this series applied (my machine doesn't have any ANDD entries to trigger the existing case). > In fact the rmrr list in the Intel IOMMU driver is always static after > parsing the ACPI/DMAR tables. There's no need to protect it with a lock. > Hence we can safely remove below down/up_read(). IIRC that leads to RCU warnings via for_each_dev_scope(), though. I did try replacing this down_read() with rcu_read_lock(), but then it doesn't like the GFP_KERNEL allocation in iommu_alloc_resv_region(), and that's where I gave up :) I'm mostly left wondering whether the dmar_drhd_units list really needs to be RCU protected at all, as that seems to be the root of most of the problems here. Cheers, Robin. > 4512 static void intel_iommu_get_resv_regions(struct device *device, > 4513 struct list_head *head) > 4514 { > 4515 int prot = DMA_PTE_READ | DMA_PTE_WRITE; > 4516 struct iommu_resv_region *reg; > 4517 struct dmar_rmrr_unit *rmrr; > 4518 struct device *i_dev; > 4519 int i; > 4520 > 4521 down_read(&dmar_global_lock); > 4522 for_each_rmrr_units(rmrr) { > 4523 for_each_active_dev_scope(rmrr->devices, > rmrr->devices_cnt, > 4524 i, i_dev) { > > Best regards, > baolu ---->8----- [ 11.421712] pci 0000:00:1b.0: Adding to iommu group 0 [ 11.421977] [ 11.421978] ============================================ [ 11.421979] WARNING: possible recursive locking detected [ 11.421981] 5.19.0-rc3-00015-gdc44a2269276 #32 Not tainted [ 11.421984] -------------------------------------------- [ 11.421985] swapper/0/1 is trying to acquire lock: [ 11.421986] ffffffffb987b770 (dmar_global_lock){++++}-{3:3}, at: intel_iommu_get_resv_regions+0x28/0x3a0 [ 11.422000] [ 11.422000] but task is already holding lock: [ 11.422001] ffffffffb987b770 (dmar_global_lock){++++}-{3:3}, at: intel_iommu_init+0x1638/0x1a08 [ 11.422011] [ 11.422011] other info that might help us debug this: [ 11.422013] Possible unsafe locking scenario: [ 11.422013] [ 11.422013] CPU0 [ 11.422014] ---- [ 11.422015] lock(dmar_global_lock); [ 11.422018] lock(dmar_global_lock); [ 11.422020] [ 11.422020] *** DEADLOCK *** [ 11.422020] [ 11.422021] May be due to missing lock nesting notation [ 11.422021] [ 11.422022] 2 locks held by swapper/0/1: [ 11.422024] #0: ffffffffb987b770 (dmar_global_lock){++++}-{3:3}, at: intel_iommu_init+0x1638/0x1a08 [ 11.422033] #1: ffff8881077a10c0 (&group->mutex){+.+.}-{3:3}, at: bus_iommu_probe+0x139/0x4c0 [ 11.422044] [ 11.422044] stack backtrace: [ 11.422046] CPU: 8 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc3-00015-gdc44a2269276 #32 [ 11.422050] Hardware name: LENOVO 30B6S08J03/1030, BIOS S01KT29A 06/20/2016 [ 11.422052] Call Trace: [ 11.422054] <TASK> [ 11.422056] dump_stack_lvl+0x45/0x59 [ 11.422064] __lock_acquire.cold+0x131/0x305 [ 11.422075] ? lockdep_hardirqs_on_prepare+0x220/0x220 [ 11.422082] ? lock_is_held_type+0xd7/0x130 [ 11.422090] ? rcu_read_lock_sched_held+0x9c/0xd0 [ 11.422098] lock_acquire+0x165/0x410 [ 11.422102] ? intel_iommu_get_resv_regions+0x28/0x3a0 [ 11.422108] ? lock_release+0x410/0x410 [ 11.422112] ? __iommu_domain_alloc+0xc5/0x130 [ 11.422116] ? iommu_group_alloc_default_domain+0x16/0x90 [ 11.422121] ? bus_iommu_probe+0x26d/0x4c0 [ 11.422126] ? iommu_device_register+0x11e/0x160 [ 11.422130] ? intel_iommu_init+0x16e0/0x1a08 [ 11.422135] ? do_one_initcall+0xb6/0x3c0 [ 11.422140] ? lock_is_held_type+0xd7/0x130 [ 11.422147] down_read+0x97/0x2f0 [ 11.422152] ? intel_iommu_get_resv_regions+0x28/0x3a0 [ 11.422156] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 11.422161] ? rwsem_down_read_slowpath+0xa10/0xa10 [ 11.422166] ? find_held_lock+0x85/0xa0 [ 11.422173] intel_iommu_get_resv_regions+0x28/0x3a0 [ 11.422178] ? rcu_read_lock_sched_held+0x9c/0xd0 [ 11.422185] iommu_create_device_direct_mappings.isra.0+0x11a/0x330 [ 11.422193] ? iommu_map+0x50/0x50 [ 11.422198] ? __iommu_domain_alloc+0xc5/0x130 [ 11.422205] bus_iommu_probe+0x2bc/0x4c0 [ 11.422210] ? iommu_device_register+0xba/0x160 [ 11.422216] ? iommu_group_default_domain+0x20/0x20 [ 11.422221] ? do_raw_spin_lock+0x114/0x1d0 [ 11.422226] ? rwlock_bug.part.0+0x50/0x50 [ 11.422231] ? rwsem_down_read_slowpath+0xa10/0xa10 [ 11.422239] iommu_device_register+0x11e/0x160 [ 11.422244] intel_iommu_init+0x16e0/0x1a08 [ 11.422249] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 11.422254] ? _raw_spin_unlock_irqrestore+0x28/0x50 [ 11.422261] ? lock_release+0x240/0x410 [ 11.422265] ? populate_rootfs+0x26/0x37 [ 11.422272] ? lock_downgrade+0x3a0/0x3a0 [ 11.422277] ? dmar_parse_one_rmrr+0x203/0x203 [ 11.422281] ? lock_is_held_type+0xd7/0x130 [ 11.422286] ? iommu_setup+0x282/0x282 [ 11.422291] ? rcu_read_lock_sched_held+0x9c/0xd0 [ 11.422296] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 11.422301] ? up_write+0xd3/0x260 [ 11.422305] ? iommu_setup+0x282/0x282 [ 11.422309] pci_iommu_init+0x15/0x39 [ 11.422313] do_one_initcall+0xb6/0x3c0 [ 11.422317] ? initcall_blacklisted+0x120/0x120 [ 11.422322] ? rcu_read_lock_sched_held+0x9c/0xd0 [ 11.422327] ? rcu_read_lock_bh_held+0xb0/0xb0 [ 11.422331] ? kasan_unpoison+0x23/0x50 [ 11.422337] ? __kasan_slab_alloc+0x2c/0x80 [ 11.422344] kernel_init_freeable+0x330/0x389 [ 11.422349] ? rest_init+0x1b0/0x1b0 [ 11.422354] kernel_init+0x14/0x130 [ 11.422359] ret_from_fork+0x22/0x30 [ 11.422367] </TASK>
Hi Robin, On 2022/7/15 20:37, Robin Murphy wrote: >> In fact the rmrr list in the Intel IOMMU driver is always static after >> parsing the ACPI/DMAR tables. There's no need to protect it with a lock. >> Hence we can safely remove below down/up_read(). > > IIRC that leads to RCU warnings via for_each_dev_scope(), though. I did > try replacing this down_read() with rcu_read_lock(), but then it doesn't > like the GFP_KERNEL allocation in iommu_alloc_resv_region(), and that's > where I gave up :) > > I'm mostly left wondering whether the dmar_drhd_units list really needs > to be RCU protected at all, as that seems to be the root of most of the > problems here. I just posted a fix patch here: https://lore.kernel.org/linux-iommu/20220718235325.3952426-1-baolu.lu@linux.intel.com/ It can remove the recursive locking and RCU warnings. Can you please take a look at it? Best regards, baolu
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 44016594831d..3e02c08802a0 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4600,7 +4600,7 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev) u8 bus, devfn; iommu = device_to_iommu(dev, &bus, &devfn); - if (!iommu) + if (!iommu || !iommu->iommu.ops) return ERR_PTR(-ENODEV); info = kzalloc(sizeof(*info), GFP_KERNEL);
Currently we rely on registering all our instances before initially allowing any .probe_device calls via bus_set_iommu(). In preparation for phasing out the latter, make sure we won't inadvertently return success for a device associated with a known but not yet registered instance, otherwise we'll run straight into iommu_group_get_for_dev() trying to use NULL ops. That also highlights an issue with intel_iommu_get_resv_regions() taking dmar_global_lock from within a section where intel_iommu_init() already holds it, which already exists via probe_acpi_namespace_devices() when an ANDD device is probed, but gets more obvious with the upcoming change to iommu_device_register(). Since they are both read locks it manages not to deadlock in practice, so I'm leaving it here for someone with more confidence to tackle a larger rework of the locking. Signed-off-by: Robin Murphy <robin.murphy@arm.com> --- v3: New drivers/iommu/intel/iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)