Message ID | 20240628151845.22166-1-yan.y.zhao@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfio: Get/put KVM only for the first/last vfio_df_open/close in cdev path | expand |
On 2024/6/28 23:18, Yan Zhao wrote: > In the device cdev path, adjust the handling of the KVM reference count to > only increment with the first vfio_df_open() and decrement after the final > vfio_df_close(). This change addresses a KVM reference leak that occurs > when a device cdev file is opened multiple times and attempts to bind to > iommufd repeatedly. > > Currently, vfio_df_get_kvm_safe() is invoked prior to each vfio_df_open() > in the cdev path during iommufd binding. The corresponding > vfio_device_put_kvm() is executed either when iommufd is unbound or if an > error occurs during the binding process. > > However, issues arise when a device binds to iommufd more than once. The > second vfio_df_open() will fail during iommufd binding, and > vfio_device_put_kvm() will be triggered, setting device->kvm to NULL. > Consequently, when iommufd is unbound from the first successfully bound > device, vfio_device_put_kvm() becomes ineffective, leading to a leak in the > KVM reference count. Good catch!!! > Below is the calltrace that will be produced in this scenario when the KVM > module is unloaded afterwards, reporting "BUG kvm_vcpu (Tainted: G S): > Objects remaining in kvm_vcpu on __kmem_cache_shutdown()". > > Call Trace: > <TASK> > dump_stack_lvl+0x80/0xc0 > slab_err+0xb0/0xf0 > ? __kmem_cache_shutdown+0xc1/0x4e0 > ? rcu_is_watching+0x11/0x50 > ? lock_acquired+0x144/0x3c0 > __kmem_cache_shutdown+0x1b7/0x4e0 > kmem_cache_destroy+0xa6/0x260 > kvm_exit+0x80/0xc0 [kvm] > vmx_exit+0xe/0x20 [kvm_intel] > __x64_sys_delete_module+0x143/0x250 > ? ktime_get_coarse_real_ts64+0xd3/0xe0 > ? syscall_trace_enter+0x143/0x210 > do_syscall_64+0x6f/0x140 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > Fixes: 5fcc26969a16 ("vfio: Add VFIO_DEVICE_BIND_IOMMUFD") > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > --- > drivers/vfio/device_cdev.c | 13 +++++++++---- > 1 file changed, 9 insertions(+), 4 deletions(-) > > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c > index bb1817bd4ff3..3b85d01d1b27 100644 > --- a/drivers/vfio/device_cdev.c > +++ b/drivers/vfio/device_cdev.c > @@ -65,6 +65,7 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, > { > struct vfio_device *device = df->device; > struct vfio_device_bind_iommufd bind; > + bool put_kvm = false; > unsigned long minsz; > int ret; > > @@ -101,12 +102,15 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, > } > > /* > - * Before the device open, get the KVM pointer currently > + * Before the device's first open, get the KVM pointer currently > * associated with the device file (if there is) and obtain > - * a reference. This reference is held until device closed. > + * a reference. This reference is held until device's last closed. > * Save the pointer in the device for use by drivers. > */ > - vfio_df_get_kvm_safe(df); > + if (device->open_count == 0) { > + vfio_df_get_kvm_safe(df); > + put_kvm = true; > + } > > ret = vfio_df_open(df); > if (ret) > @@ -129,7 +133,8 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, > out_close_device: > vfio_df_close(df); > out_put_kvm: > - vfio_device_put_kvm(device); > + if (put_kvm) you may use if (device->open_count == 0) as well here to save a bool. Otherwise looks good to me. Reviewed-by: Yi Liu <yi.l.liu@intel.com> > + vfio_device_put_kvm(device); > iommufd_ctx_put(df->iommufd); > df->iommufd = NULL; > out_unlock: > > base-commit: 6ba59ff4227927d3a8530fc2973b80e94b54d58f BTW. The vfio_device_get_kvm_safe() is not supposed to be invoked multiple times by design as the second call would override the device->put_kvm and device->kvm. This does not change the put_kvm nor the kvm though. But not a good thing anyghow. maybe worth a warn like below. diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index ee72c0b61795..a4bead0e5820 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -408,6 +408,8 @@ void vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm) if (!kvm) return; + WARN_ON(device->put_kvm || device->kvm); + pfn = symbol_get(kvm_put_kvm); if (WARN_ON(!pfn)) return;
On Mon, Jul 01, 2024 at 01:08:30PM +0800, Yi Liu wrote: > On 2024/6/28 23:18, Yan Zhao wrote: > > In the device cdev path, adjust the handling of the KVM reference count to > > only increment with the first vfio_df_open() and decrement after the final > > vfio_df_close(). This change addresses a KVM reference leak that occurs > > when a device cdev file is opened multiple times and attempts to bind to > > iommufd repeatedly. > > > > Currently, vfio_df_get_kvm_safe() is invoked prior to each vfio_df_open() > > in the cdev path during iommufd binding. The corresponding > > vfio_device_put_kvm() is executed either when iommufd is unbound or if an > > error occurs during the binding process. > > > > However, issues arise when a device binds to iommufd more than once. The > > second vfio_df_open() will fail during iommufd binding, and > > vfio_device_put_kvm() will be triggered, setting device->kvm to NULL. > > Consequently, when iommufd is unbound from the first successfully bound > > device, vfio_device_put_kvm() becomes ineffective, leading to a leak in the > > KVM reference count. > > Good catch!!! > > > Below is the calltrace that will be produced in this scenario when the KVM > > module is unloaded afterwards, reporting "BUG kvm_vcpu (Tainted: G S): > > Objects remaining in kvm_vcpu on __kmem_cache_shutdown()". > > > > Call Trace: > > <TASK> > > dump_stack_lvl+0x80/0xc0 > > slab_err+0xb0/0xf0 > > ? __kmem_cache_shutdown+0xc1/0x4e0 > > ? rcu_is_watching+0x11/0x50 > > ? lock_acquired+0x144/0x3c0 > > __kmem_cache_shutdown+0x1b7/0x4e0 > > kmem_cache_destroy+0xa6/0x260 > > kvm_exit+0x80/0xc0 [kvm] > > vmx_exit+0xe/0x20 [kvm_intel] > > __x64_sys_delete_module+0x143/0x250 > > ? ktime_get_coarse_real_ts64+0xd3/0xe0 > > ? syscall_trace_enter+0x143/0x210 > > do_syscall_64+0x6f/0x140 > > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > > Fixes: 5fcc26969a16 ("vfio: Add VFIO_DEVICE_BIND_IOMMUFD") > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > > --- > > drivers/vfio/device_cdev.c | 13 +++++++++---- > > 1 file changed, 9 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c > > index bb1817bd4ff3..3b85d01d1b27 100644 > > --- a/drivers/vfio/device_cdev.c > > +++ b/drivers/vfio/device_cdev.c > > @@ -65,6 +65,7 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, > > { > > struct vfio_device *device = df->device; > > struct vfio_device_bind_iommufd bind; > > + bool put_kvm = false; > > unsigned long minsz; > > int ret; > > @@ -101,12 +102,15 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, > > } > > /* > > - * Before the device open, get the KVM pointer currently > > + * Before the device's first open, get the KVM pointer currently > > * associated with the device file (if there is) and obtain > > - * a reference. This reference is held until device closed. > > + * a reference. This reference is held until device's last closed. > > * Save the pointer in the device for use by drivers. > > */ > > - vfio_df_get_kvm_safe(df); > > + if (device->open_count == 0) { > > + vfio_df_get_kvm_safe(df); > > + put_kvm = true; > > + } > > ret = vfio_df_open(df); > > if (ret) > > @@ -129,7 +133,8 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, > > out_close_device: > > vfio_df_close(df); > > out_put_kvm: > > - vfio_device_put_kvm(device); > > + if (put_kvm) > > you may use if (device->open_count == 0) as well here to save a bool. > Otherwise looks good to me. Upon here, device->open_count is not necessarily 0 even for the first open. The failure can be after a successful increment of device->open_count. Maybe renaming "bool put_kvm" to "bool is_first_open" to save an assignment in most common case? > Reviewed-by: Yi Liu <yi.l.liu@intel.com> Thanks:) > > > + vfio_device_put_kvm(device); > > iommufd_ctx_put(df->iommufd); > > df->iommufd = NULL; > > out_unlock: > > > > base-commit: 6ba59ff4227927d3a8530fc2973b80e94b54d58f > > BTW. The vfio_device_get_kvm_safe() is not supposed to be invoked multiple > times by design as the second call would override the device->put_kvm and > device->kvm. This does not change the put_kvm nor the kvm though. But not a "kvm" may also be changed if the second bind is from a different VM. > good thing anyghow. maybe worth a warn like below. > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c > index ee72c0b61795..a4bead0e5820 100644 > --- a/drivers/vfio/vfio_main.c > +++ b/drivers/vfio/vfio_main.c > @@ -408,6 +408,8 @@ void vfio_device_get_kvm_safe(struct vfio_device > *device, struct kvm *kvm) > if (!kvm) > return; > > + WARN_ON(device->put_kvm || device->kvm); Yes, better. > pfn = symbol_get(kvm_put_kvm); > if (WARN_ON(!pfn)) > return; > > -- > Regards, > Yi Liu
On 2024/7/1 13:42, Yan Zhao wrote: > On Mon, Jul 01, 2024 at 01:08:30PM +0800, Yi Liu wrote: >> On 2024/6/28 23:18, Yan Zhao wrote: >>> In the device cdev path, adjust the handling of the KVM reference count to >>> only increment with the first vfio_df_open() and decrement after the final >>> vfio_df_close(). This change addresses a KVM reference leak that occurs >>> when a device cdev file is opened multiple times and attempts to bind to >>> iommufd repeatedly. >>> >>> Currently, vfio_df_get_kvm_safe() is invoked prior to each vfio_df_open() >>> in the cdev path during iommufd binding. The corresponding >>> vfio_device_put_kvm() is executed either when iommufd is unbound or if an >>> error occurs during the binding process. >>> >>> However, issues arise when a device binds to iommufd more than once. The >>> second vfio_df_open() will fail during iommufd binding, and >>> vfio_device_put_kvm() will be triggered, setting device->kvm to NULL. >>> Consequently, when iommufd is unbound from the first successfully bound >>> device, vfio_device_put_kvm() becomes ineffective, leading to a leak in the >>> KVM reference count. >> >> Good catch!!! >> >>> Below is the calltrace that will be produced in this scenario when the KVM >>> module is unloaded afterwards, reporting "BUG kvm_vcpu (Tainted: G S): >>> Objects remaining in kvm_vcpu on __kmem_cache_shutdown()". >>> >>> Call Trace: >>> <TASK> >>> dump_stack_lvl+0x80/0xc0 >>> slab_err+0xb0/0xf0 >>> ? __kmem_cache_shutdown+0xc1/0x4e0 >>> ? rcu_is_watching+0x11/0x50 >>> ? lock_acquired+0x144/0x3c0 >>> __kmem_cache_shutdown+0x1b7/0x4e0 >>> kmem_cache_destroy+0xa6/0x260 >>> kvm_exit+0x80/0xc0 [kvm] >>> vmx_exit+0xe/0x20 [kvm_intel] >>> __x64_sys_delete_module+0x143/0x250 >>> ? ktime_get_coarse_real_ts64+0xd3/0xe0 >>> ? syscall_trace_enter+0x143/0x210 >>> do_syscall_64+0x6f/0x140 >>> entry_SYSCALL_64_after_hwframe+0x76/0x7e >>> >>> Fixes: 5fcc26969a16 ("vfio: Add VFIO_DEVICE_BIND_IOMMUFD") >>> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> >>> --- >>> drivers/vfio/device_cdev.c | 13 +++++++++---- >>> 1 file changed, 9 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c >>> index bb1817bd4ff3..3b85d01d1b27 100644 >>> --- a/drivers/vfio/device_cdev.c >>> +++ b/drivers/vfio/device_cdev.c >>> @@ -65,6 +65,7 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, >>> { >>> struct vfio_device *device = df->device; >>> struct vfio_device_bind_iommufd bind; >>> + bool put_kvm = false; >>> unsigned long minsz; >>> int ret; >>> @@ -101,12 +102,15 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, >>> } >>> /* >>> - * Before the device open, get the KVM pointer currently >>> + * Before the device's first open, get the KVM pointer currently >>> * associated with the device file (if there is) and obtain >>> - * a reference. This reference is held until device closed. >>> + * a reference. This reference is held until device's last closed. >>> * Save the pointer in the device for use by drivers. >>> */ >>> - vfio_df_get_kvm_safe(df); >>> + if (device->open_count == 0) { >>> + vfio_df_get_kvm_safe(df); >>> + put_kvm = true; >>> + } >>> ret = vfio_df_open(df); >>> if (ret) >>> @@ -129,7 +133,8 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, >>> out_close_device: >>> vfio_df_close(df); >>> out_put_kvm: >>> - vfio_device_put_kvm(device); >>> + if (put_kvm) >> >> you may use if (device->open_count == 0) as well here to save a bool. >> Otherwise looks good to me. > Upon here, device->open_count is not necessarily 0 even for the first open. > The failure can be after a successful increment of device->open_count. > > Maybe renaming "bool put_kvm" to "bool is_first_open" to save an assignment > in most common case? > >> Reviewed-by: Yi Liu <yi.l.liu@intel.com> > Thanks:) > >> >>> + vfio_device_put_kvm(device); >>> iommufd_ctx_put(df->iommufd); >>> df->iommufd = NULL; >>> out_unlock: >>> >>> base-commit: 6ba59ff4227927d3a8530fc2973b80e94b54d58f >> >> BTW. The vfio_device_get_kvm_safe() is not supposed to be invoked multiple >> times by design as the second call would override the device->put_kvm and >> device->kvm. This does not change the put_kvm nor the kvm though. But not a > "kvm" may also be changed if the second bind is from a different VM. > yep. >> good thing anyghow. maybe worth a warn like below. >> >> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c >> index ee72c0b61795..a4bead0e5820 100644 >> --- a/drivers/vfio/vfio_main.c >> +++ b/drivers/vfio/vfio_main.c >> @@ -408,6 +408,8 @@ void vfio_device_get_kvm_safe(struct vfio_device >> *device, struct kvm *kvm) >> if (!kvm) >> return; >> >> + WARN_ON(device->put_kvm || device->kvm); > Yes, better. > >> pfn = symbol_get(kvm_put_kvm); >> if (WARN_ON(!pfn)) >> return; >> >> -- >> Regards, >> Yi Liu
> From: Zhao, Yan Y <yan.y.zhao@intel.com> > Sent: Friday, June 28, 2024 11:19 PM > > In the device cdev path, adjust the handling of the KVM reference count to > only increment with the first vfio_df_open() and decrement after the final > vfio_df_close(). This change addresses a KVM reference leak that occurs > when a device cdev file is opened multiple times and attempts to bind to > iommufd repeatedly. > > Currently, vfio_df_get_kvm_safe() is invoked prior to each vfio_df_open() > in the cdev path during iommufd binding. The corresponding > vfio_device_put_kvm() is executed either when iommufd is unbound or if an > error occurs during the binding process. > > However, issues arise when a device binds to iommufd more than once. The > second vfio_df_open() will fail during iommufd binding, and > vfio_device_put_kvm() will be triggered, setting device->kvm to NULL. > Consequently, when iommufd is unbound from the first successfully bound > device, vfio_device_put_kvm() becomes ineffective, leading to a leak in the > KVM reference count. To be accurate this happens only when two binds are issued via different fds otherwise below check will happen earlier when two binds are in a same fd: /* one device cannot be bound twice */ if (df->access_granted) { ret = -EINVAL; goto out_unlock; } > > Below is the calltrace that will be produced in this scenario when the KVM > module is unloaded afterwards, reporting "BUG kvm_vcpu (Tainted: G S): > Objects remaining in kvm_vcpu on __kmem_cache_shutdown()". > > Call Trace: > <TASK> > dump_stack_lvl+0x80/0xc0 > slab_err+0xb0/0xf0 > ? __kmem_cache_shutdown+0xc1/0x4e0 > ? rcu_is_watching+0x11/0x50 > ? lock_acquired+0x144/0x3c0 > __kmem_cache_shutdown+0x1b7/0x4e0 > kmem_cache_destroy+0xa6/0x260 > kvm_exit+0x80/0xc0 [kvm] > vmx_exit+0xe/0x20 [kvm_intel] > __x64_sys_delete_module+0x143/0x250 > ? ktime_get_coarse_real_ts64+0xd3/0xe0 > ? syscall_trace_enter+0x143/0x210 > do_syscall_64+0x6f/0x140 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > Fixes: 5fcc26969a16 ("vfio: Add VFIO_DEVICE_BIND_IOMMUFD") > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > --- > drivers/vfio/device_cdev.c | 13 +++++++++---- > 1 file changed, 9 insertions(+), 4 deletions(-) > > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c > index bb1817bd4ff3..3b85d01d1b27 100644 > --- a/drivers/vfio/device_cdev.c > +++ b/drivers/vfio/device_cdev.c > @@ -65,6 +65,7 @@ long vfio_df_ioctl_bind_iommufd(struct > vfio_device_file *df, > { > struct vfio_device *device = df->device; > struct vfio_device_bind_iommufd bind; > + bool put_kvm = false; > unsigned long minsz; > int ret; > > @@ -101,12 +102,15 @@ long vfio_df_ioctl_bind_iommufd(struct > vfio_device_file *df, > } > > /* > - * Before the device open, get the KVM pointer currently > + * Before the device's first open, get the KVM pointer currently > * associated with the device file (if there is) and obtain > - * a reference. This reference is held until device closed. > + * a reference. This reference is held until device's last closed. > * Save the pointer in the device for use by drivers. > */ > - vfio_df_get_kvm_safe(df); > + if (device->open_count == 0) { > + vfio_df_get_kvm_safe(df); > + put_kvm = true; > + } > > ret = vfio_df_open(df); > if (ret) > @@ -129,7 +133,8 @@ long vfio_df_ioctl_bind_iommufd(struct > vfio_device_file *df, > out_close_device: > vfio_df_close(df); > out_put_kvm: > - vfio_device_put_kvm(device); > + if (put_kvm) > + vfio_device_put_kvm(device); > iommufd_ctx_put(df->iommufd); > df->iommufd = NULL; > out_unlock: > what about extending vfio_df_open() to unify the get/put_kvm() and open_count trick in one place? int vfio_df_open(struct vfio_device_file *df, struct kvm *kvm, spinlock_t *kvm_ref_lock) { struct vfio_device *device = df->device; int ret = 0; lockdep_assert_held(&device->dev_set->lock); if (device->open_count == 0) { spin_lock(kvm_ref_lock); vfio_device_get_kvm_safe(device, kvm); spin_unlock(kvm_ref_lock); } /* * Only the group path allows the device to be opened multiple * times. The device cdev path doesn't have a secure way for it. */ if (device->open_count != 0 && !df->group) return -EINVAL; device->open_count++; if (device->open_count == 1) { ret = vfio_df_device_first_open(df); if (ret) device->open_count--; } if (ret) vfio_device_put_kvm(device); return ret; } void vfio_df_close(struct vfio_device_file *df) { struct vfio_device *device = df->device; lockdep_assert_held(&device->dev_set->lock); vfio_assert_device_open(device); if (device->open_count == 1) { vfio_df_device_last_close(df); vfio_device_put_kvm(device); } device->open_count--; }
On 2024/7/1 16:43, Tian, Kevin wrote: >> From: Zhao, Yan Y <yan.y.zhao@intel.com> >> Sent: Friday, June 28, 2024 11:19 PM >> >> In the device cdev path, adjust the handling of the KVM reference count to >> only increment with the first vfio_df_open() and decrement after the final >> vfio_df_close(). This change addresses a KVM reference leak that occurs >> when a device cdev file is opened multiple times and attempts to bind to >> iommufd repeatedly. >> >> Currently, vfio_df_get_kvm_safe() is invoked prior to each vfio_df_open() >> in the cdev path during iommufd binding. The corresponding >> vfio_device_put_kvm() is executed either when iommufd is unbound or if an >> error occurs during the binding process. >> >> However, issues arise when a device binds to iommufd more than once. The >> second vfio_df_open() will fail during iommufd binding, and >> vfio_device_put_kvm() will be triggered, setting device->kvm to NULL. >> Consequently, when iommufd is unbound from the first successfully bound >> device, vfio_device_put_kvm() becomes ineffective, leading to a leak in the >> KVM reference count. > > To be accurate this happens only when two binds are issued via different > fds otherwise below check will happen earlier when two binds are in a > same fd: > > /* one device cannot be bound twice */ > if (df->access_granted) { > ret = -EINVAL; > goto out_unlock; > } yes >> >> Below is the calltrace that will be produced in this scenario when the KVM >> module is unloaded afterwards, reporting "BUG kvm_vcpu (Tainted: G S): >> Objects remaining in kvm_vcpu on __kmem_cache_shutdown()". >> >> Call Trace: >> <TASK> >> dump_stack_lvl+0x80/0xc0 >> slab_err+0xb0/0xf0 >> ? __kmem_cache_shutdown+0xc1/0x4e0 >> ? rcu_is_watching+0x11/0x50 >> ? lock_acquired+0x144/0x3c0 >> __kmem_cache_shutdown+0x1b7/0x4e0 >> kmem_cache_destroy+0xa6/0x260 >> kvm_exit+0x80/0xc0 [kvm] >> vmx_exit+0xe/0x20 [kvm_intel] >> __x64_sys_delete_module+0x143/0x250 >> ? ktime_get_coarse_real_ts64+0xd3/0xe0 >> ? syscall_trace_enter+0x143/0x210 >> do_syscall_64+0x6f/0x140 >> entry_SYSCALL_64_after_hwframe+0x76/0x7e >> >> Fixes: 5fcc26969a16 ("vfio: Add VFIO_DEVICE_BIND_IOMMUFD") >> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> >> --- >> drivers/vfio/device_cdev.c | 13 +++++++++---- >> 1 file changed, 9 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c >> index bb1817bd4ff3..3b85d01d1b27 100644 >> --- a/drivers/vfio/device_cdev.c >> +++ b/drivers/vfio/device_cdev.c >> @@ -65,6 +65,7 @@ long vfio_df_ioctl_bind_iommufd(struct >> vfio_device_file *df, >> { >> struct vfio_device *device = df->device; >> struct vfio_device_bind_iommufd bind; >> + bool put_kvm = false; >> unsigned long minsz; >> int ret; >> >> @@ -101,12 +102,15 @@ long vfio_df_ioctl_bind_iommufd(struct >> vfio_device_file *df, >> } >> >> /* >> - * Before the device open, get the KVM pointer currently >> + * Before the device's first open, get the KVM pointer currently >> * associated with the device file (if there is) and obtain >> - * a reference. This reference is held until device closed. >> + * a reference. This reference is held until device's last closed. >> * Save the pointer in the device for use by drivers. >> */ >> - vfio_df_get_kvm_safe(df); >> + if (device->open_count == 0) { >> + vfio_df_get_kvm_safe(df); >> + put_kvm = true; >> + } >> >> ret = vfio_df_open(df); >> if (ret) >> @@ -129,7 +133,8 @@ long vfio_df_ioctl_bind_iommufd(struct >> vfio_device_file *df, >> out_close_device: >> vfio_df_close(df); >> out_put_kvm: >> - vfio_device_put_kvm(device); >> + if (put_kvm) >> + vfio_device_put_kvm(device); >> iommufd_ctx_put(df->iommufd); >> df->iommufd = NULL; >> out_unlock: >> > > what about extending vfio_df_open() to unify the get/put_kvm() > and open_count trick in one place? > > int vfio_df_open(struct vfio_device_file *df, struct kvm *kvm, > spinlock_t *kvm_ref_lock) > { this should work. But need a comment to note why need pass in both kvm and kvm_ref_lock given df has both of them. :) > struct vfio_device *device = df->device; > int ret = 0; > > lockdep_assert_held(&device->dev_set->lock); > > if (device->open_count == 0) { > spin_lock(kvm_ref_lock); > vfio_device_get_kvm_safe(device, kvm); > spin_unlock(kvm_ref_lock); > } perhaps it can be put in the "if (device->open_count == 1)" branch, just before invoking vfio_df_device_first_open(). > > /* > * Only the group path allows the device to be opened multiple > * times. The device cdev path doesn't have a secure way for it. > */ > if (device->open_count != 0 && !df->group) > return -EINVAL; > > device->open_count++; > if (device->open_count == 1) { > ret = vfio_df_device_first_open(df); > if (ret) > device->open_count--; > } > > if (ret) > vfio_device_put_kvm(device); > return ret; > } > > void vfio_df_close(struct vfio_device_file *df) > { > struct vfio_device *device = df->device; > > lockdep_assert_held(&device->dev_set->lock); > > vfio_assert_device_open(device); > if (device->open_count == 1) { > vfio_df_device_last_close(df); > vfio_device_put_kvm(device); > } > device->open_count--; > }
On Mon, Jul 01, 2024 at 06:30:05PM +0800, Yi Liu wrote: > On 2024/7/1 16:43, Tian, Kevin wrote: > > > From: Zhao, Yan Y <yan.y.zhao@intel.com> > > > Sent: Friday, June 28, 2024 11:19 PM > > > > > > In the device cdev path, adjust the handling of the KVM reference count to > > > only increment with the first vfio_df_open() and decrement after the final > > > vfio_df_close(). This change addresses a KVM reference leak that occurs > > > when a device cdev file is opened multiple times and attempts to bind to > > > iommufd repeatedly. > > > > > > Currently, vfio_df_get_kvm_safe() is invoked prior to each vfio_df_open() > > > in the cdev path during iommufd binding. The corresponding > > > vfio_device_put_kvm() is executed either when iommufd is unbound or if an > > > error occurs during the binding process. > > > > > > However, issues arise when a device binds to iommufd more than once. The > > > second vfio_df_open() will fail during iommufd binding, and > > > vfio_device_put_kvm() will be triggered, setting device->kvm to NULL. > > > Consequently, when iommufd is unbound from the first successfully bound > > > device, vfio_device_put_kvm() becomes ineffective, leading to a leak in the > > > KVM reference count. > > > > To be accurate this happens only when two binds are issued via different > > fds otherwise below check will happen earlier when two binds are in a > > same fd: > > > > /* one device cannot be bound twice */ > > if (df->access_granted) { > > ret = -EINVAL; > > goto out_unlock; > > } > > yes > > > > > > > Below is the calltrace that will be produced in this scenario when the KVM > > > module is unloaded afterwards, reporting "BUG kvm_vcpu (Tainted: G S): > > > Objects remaining in kvm_vcpu on __kmem_cache_shutdown()". > > > > > > Call Trace: > > > <TASK> > > > dump_stack_lvl+0x80/0xc0 > > > slab_err+0xb0/0xf0 > > > ? __kmem_cache_shutdown+0xc1/0x4e0 > > > ? rcu_is_watching+0x11/0x50 > > > ? lock_acquired+0x144/0x3c0 > > > __kmem_cache_shutdown+0x1b7/0x4e0 > > > kmem_cache_destroy+0xa6/0x260 > > > kvm_exit+0x80/0xc0 [kvm] > > > vmx_exit+0xe/0x20 [kvm_intel] > > > __x64_sys_delete_module+0x143/0x250 > > > ? ktime_get_coarse_real_ts64+0xd3/0xe0 > > > ? syscall_trace_enter+0x143/0x210 > > > do_syscall_64+0x6f/0x140 > > > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > > > > Fixes: 5fcc26969a16 ("vfio: Add VFIO_DEVICE_BIND_IOMMUFD") > > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> > > > --- > > > drivers/vfio/device_cdev.c | 13 +++++++++---- > > > 1 file changed, 9 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c > > > index bb1817bd4ff3..3b85d01d1b27 100644 > > > --- a/drivers/vfio/device_cdev.c > > > +++ b/drivers/vfio/device_cdev.c > > > @@ -65,6 +65,7 @@ long vfio_df_ioctl_bind_iommufd(struct > > > vfio_device_file *df, > > > { > > > struct vfio_device *device = df->device; > > > struct vfio_device_bind_iommufd bind; > > > + bool put_kvm = false; > > > unsigned long minsz; > > > int ret; > > > > > > @@ -101,12 +102,15 @@ long vfio_df_ioctl_bind_iommufd(struct > > > vfio_device_file *df, > > > } > > > > > > /* > > > - * Before the device open, get the KVM pointer currently > > > + * Before the device's first open, get the KVM pointer currently > > > * associated with the device file (if there is) and obtain > > > - * a reference. This reference is held until device closed. > > > + * a reference. This reference is held until device's last closed. > > > * Save the pointer in the device for use by drivers. > > > */ > > > - vfio_df_get_kvm_safe(df); > > > + if (device->open_count == 0) { > > > + vfio_df_get_kvm_safe(df); > > > + put_kvm = true; > > > + } > > > > > > ret = vfio_df_open(df); > > > if (ret) > > > @@ -129,7 +133,8 @@ long vfio_df_ioctl_bind_iommufd(struct > > > vfio_device_file *df, > > > out_close_device: > > > vfio_df_close(df); > > > out_put_kvm: > > > - vfio_device_put_kvm(device); > > > + if (put_kvm) > > > + vfio_device_put_kvm(device); > > > iommufd_ctx_put(df->iommufd); > > > df->iommufd = NULL; > > > out_unlock: > > > > > > > what about extending vfio_df_open() to unify the get/put_kvm() > > and open_count trick in one place? > > > > int vfio_df_open(struct vfio_device_file *df, struct kvm *kvm, > > spinlock_t *kvm_ref_lock) > > { > > this should work. But need a comment to note why need pass in both kvm > and kvm_ref_lock given df has both of them. :) So why to pass them? What about making vfio_device_group_get_kvm_safe() or vfio_df_get_kvm_safe() not static and calling one of them in vfio_df_open() according to the df->group ? > > > struct vfio_device *device = df->device; > > int ret = 0; > > > > lockdep_assert_held(&device->dev_set->lock); > > > > if (device->open_count == 0) { > > spin_lock(kvm_ref_lock); > > vfio_device_get_kvm_safe(device, kvm); > > spin_unlock(kvm_ref_lock); > > } > > perhaps it can be put in the "if (device->open_count == 1)" branch, just > before invoking vfio_df_device_first_open(). What about just moving the get/put_kvm into vfio_df_device_first_open()? > > > > /* > > * Only the group path allows the device to be opened multiple > > * times. The device cdev path doesn't have a secure way for it. > > */ > > if (device->open_count != 0 && !df->group) > > return -EINVAL; > > > > device->open_count++; > > if (device->open_count == 1) { > > ret = vfio_df_device_first_open(df); > > if (ret) > > device->open_count--; > > } > > > > if (ret) > > vfio_device_put_kvm(device); > > return ret; > > } > > > > void vfio_df_close(struct vfio_device_file *df) > > { > > struct vfio_device *device = df->device; > > > > lockdep_assert_held(&device->dev_set->lock); > > > > vfio_assert_device_open(device); > > if (device->open_count == 1) { > > vfio_df_device_last_close(df); > > vfio_device_put_kvm(device); > > } > > device->open_count--; > > } > > -- > Regards, > Yi Liu
> From: Zhao, Yan Y <yan.y.zhao@intel.com> > Sent: Monday, July 1, 2024 8:02 PM > > On Mon, Jul 01, 2024 at 06:30:05PM +0800, Yi Liu wrote: > > On 2024/7/1 16:43, Tian, Kevin wrote: > > > > > > what about extending vfio_df_open() to unify the get/put_kvm() > > > and open_count trick in one place? > > > > > > int vfio_df_open(struct vfio_device_file *df, struct kvm *kvm, > > > spinlock_t *kvm_ref_lock) > > > { > > > > this should work. But need a comment to note why need pass in both kvm > > and kvm_ref_lock given df has both of them. :) > So why to pass them? hmm actually passing them is wrong especially for the group path. We have to get kvm upon the first reference to the pointer otherwise it is prone to use-after-free issue. > > What about making vfio_device_group_get_kvm_safe() or > vfio_df_get_kvm_safe() > not static and calling one of them in vfio_df_open() according to the df- > >group > ? > yeah, this sounds better.
diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c index bb1817bd4ff3..3b85d01d1b27 100644 --- a/drivers/vfio/device_cdev.c +++ b/drivers/vfio/device_cdev.c @@ -65,6 +65,7 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, { struct vfio_device *device = df->device; struct vfio_device_bind_iommufd bind; + bool put_kvm = false; unsigned long minsz; int ret; @@ -101,12 +102,15 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, } /* - * Before the device open, get the KVM pointer currently + * Before the device's first open, get the KVM pointer currently * associated with the device file (if there is) and obtain - * a reference. This reference is held until device closed. + * a reference. This reference is held until device's last closed. * Save the pointer in the device for use by drivers. */ - vfio_df_get_kvm_safe(df); + if (device->open_count == 0) { + vfio_df_get_kvm_safe(df); + put_kvm = true; + } ret = vfio_df_open(df); if (ret) @@ -129,7 +133,8 @@ long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df, out_close_device: vfio_df_close(df); out_put_kvm: - vfio_device_put_kvm(device); + if (put_kvm) + vfio_device_put_kvm(device); iommufd_ctx_put(df->iommufd); df->iommufd = NULL; out_unlock:
In the device cdev path, adjust the handling of the KVM reference count to only increment with the first vfio_df_open() and decrement after the final vfio_df_close(). This change addresses a KVM reference leak that occurs when a device cdev file is opened multiple times and attempts to bind to iommufd repeatedly. Currently, vfio_df_get_kvm_safe() is invoked prior to each vfio_df_open() in the cdev path during iommufd binding. The corresponding vfio_device_put_kvm() is executed either when iommufd is unbound or if an error occurs during the binding process. However, issues arise when a device binds to iommufd more than once. The second vfio_df_open() will fail during iommufd binding, and vfio_device_put_kvm() will be triggered, setting device->kvm to NULL. Consequently, when iommufd is unbound from the first successfully bound device, vfio_device_put_kvm() becomes ineffective, leading to a leak in the KVM reference count. Below is the calltrace that will be produced in this scenario when the KVM module is unloaded afterwards, reporting "BUG kvm_vcpu (Tainted: G S): Objects remaining in kvm_vcpu on __kmem_cache_shutdown()". Call Trace: <TASK> dump_stack_lvl+0x80/0xc0 slab_err+0xb0/0xf0 ? __kmem_cache_shutdown+0xc1/0x4e0 ? rcu_is_watching+0x11/0x50 ? lock_acquired+0x144/0x3c0 __kmem_cache_shutdown+0x1b7/0x4e0 kmem_cache_destroy+0xa6/0x260 kvm_exit+0x80/0xc0 [kvm] vmx_exit+0xe/0x20 [kvm_intel] __x64_sys_delete_module+0x143/0x250 ? ktime_get_coarse_real_ts64+0xd3/0xe0 ? syscall_trace_enter+0x143/0x210 do_syscall_64+0x6f/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: 5fcc26969a16 ("vfio: Add VFIO_DEVICE_BIND_IOMMUFD") Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> --- drivers/vfio/device_cdev.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) base-commit: 6ba59ff4227927d3a8530fc2973b80e94b54d58f