Message ID | 1695407915-12216-1-git-send-email-nunodasneves@linux.microsoft.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce /dev/mshv drivers | expand |
On Fri, Sep 22, 2023 at 11:38:35AM -0700, Nuno Das Neves wrote: > Add mshv, mshv_root, and mshv_vtl modules: > > Module mshv is the parent module to the other two. It provides /dev/mshv, > plus some common hypercall helper code. When one of the child modules is > loaded, it is registered with the mshv module, which then provides entry > point(s) to the child module via the IOCTLs defined in uapi/linux/mshv.h. > > E.g. When the mshv_root module is loaded, it registers itself, and the > MSHV_CREATE_PARTITION IOCTL becomes available in /dev/mshv. That is used to > get a partition fd managed by mshv_root. > > Similarly for mshv_vtl module, there is MSHV_CREATE_VTL, which creates > an fd representing the lower vtl, managed by mshv_vtl. > > Module mshv_root provides APIs for creating and managing child partitions. > It defines abstractions for partitions (vms), vps (vcpus), and other things > related to running a guest. It exposes the userspace interfaces for a VMM > to manage the guest. > > Module mshv_vtl provides VTL (Virtual Trust Level) support for VMMs. In > this scenario, the host kernel and VMM run in a higher trust level than the > guest, but within the same partition. This provides better isolation and > performance. > > Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> As far as I can tell, all my comments from the previous version are addressed. I believe Saurabh and Boqun's comments are addressed, too. The code looks good to me, so: Acked-by: Wei Liu <wei.liu@kernel.org> I will wait for some time for others to chime in, just in case the community has more comments.
On Fri, Sep 22, 2023 at 11:38:35AM -0700, Nuno Das Neves wrote: > +static int __init mshv_vtl_init(void) > +{ > + int ret; > + > + tasklet_init(&msg_dpc, mshv_vtl_sint_on_msg_dpc, 0); > + init_waitqueue_head(&fd_wait_queue); > + > + if (mshv_vtl_get_vsm_regs()) { > + pr_emerg("%s: Unable to get VSM capabilities !!\n", __func__); > + BUG(); > + } So you crash the whole kernel if someone loads this module on a non-mshv system? That seems quite excessive and hostile :( Or am I somehow reading this incorrectly? thanks, greg k-h
On Fri, Sep 22, 2023 at 11:38:35AM -0700, Nuno Das Neves wrote: > +static int mshv_vtl_get_vsm_regs(void) > +{ > + struct hv_register_assoc registers[2]; > + union hv_input_vtl input_vtl; > + int ret, count = 2; > + > + input_vtl.as_uint8 = 0; > + registers[0].name = HV_REGISTER_VSM_CODE_PAGE_OFFSETS; > + registers[1].name = HV_REGISTER_VSM_CAPABILITIES; > + > + ret = hv_call_get_vp_registers(HV_VP_INDEX_SELF, HV_PARTITION_ID_SELF, > + count, input_vtl, registers); > + if (ret) > + return ret; > + > + mshv_vsm_page_offsets.as_uint64 = registers[0].value.reg64; > + mshv_vsm_capabilities.as_uint64 = registers[1].value.reg64; > + > + pr_debug("%s: VSM code page offsets: %#016llx\n", __func__, > + mshv_vsm_page_offsets.as_uint64); > + pr_info("%s: VSM capabilities: %#016llx\n", __func__, > + mshv_vsm_capabilities.as_uint64); When drivers are working properly, they are quiet. This is very noisy and probably is leaking memory addresses to userspace? Also, there is NEVER a need for __func__ in a pr_debug() line, it has that for you automatically. Also, drivers should never call pr_*() calls, always use the proper dev_*() calls instead. > + > + return ret; > +} > + > +static int mshv_vtl_configure_vsm_partition(void) > +{ > + union hv_register_vsm_partition_config config; > + struct hv_register_assoc reg_assoc; > + union hv_input_vtl input_vtl; > + > + config.as_u64 = 0; > + config.default_vtl_protection_mask = HV_MAP_GPA_PERMISSIONS_MASK; > + config.enable_vtl_protection = 1; > + config.zero_memory_on_reset = 1; > + config.intercept_vp_startup = 1; > + config.intercept_cpuid_unimplemented = 1; > + > + if (mshv_vsm_capabilities.intercept_page_available) { > + pr_debug("%s: using intercept page", __func__); Again, __func__ is not needed, you are providing it twice here for no real reason except to waste storage space :) > + config.intercept_page = 1; > + } > + > + reg_assoc.name = HV_REGISTER_VSM_PARTITION_CONFIG; > + reg_assoc.value.reg64 = config.as_u64; > + input_vtl.as_uint8 = 0; > + > + return hv_call_set_vp_registers(HV_VP_INDEX_SELF, HV_PARTITION_ID_SELF, > + 1, input_vtl, ®_assoc); None of this needs to be unwound if initialization fails later on? thanks, greg k-h
Greg, thank you for looking at the code. On Sat, Sep 23, 2023 at 09:56:13AM +0200, Greg KH wrote: > On Fri, Sep 22, 2023 at 11:38:35AM -0700, Nuno Das Neves wrote: > > +static int __init mshv_vtl_init(void) > > +{ > > + int ret; > > + > > + tasklet_init(&msg_dpc, mshv_vtl_sint_on_msg_dpc, 0); > > + init_waitqueue_head(&fd_wait_queue); > > + > > + if (mshv_vtl_get_vsm_regs()) { > > + pr_emerg("%s: Unable to get VSM capabilities !!\n", __func__); > > + BUG(); > > + } > > > So you crash the whole kernel if someone loads this module on a non-mshv > system? > > That seems quite excessive and hostile :( > I agree this can be more lenient. It should be safe to just return an error code (ENODEV) and let user space decide the next action. Saurabh, let me know what you think. Thanks, Wei. > Or am I somehow reading this incorrectly? > > thanks, > > greg k-h >
On Sat, Sep 23, 2023 at 08:58:00PM +0000, Wei Liu wrote: > Greg, thank you for looking at the code. > > On Sat, Sep 23, 2023 at 09:56:13AM +0200, Greg KH wrote: > > On Fri, Sep 22, 2023 at 11:38:35AM -0700, Nuno Das Neves wrote: > > > +static int __init mshv_vtl_init(void) > > > +{ > > > + int ret; > > > + > > > + tasklet_init(&msg_dpc, mshv_vtl_sint_on_msg_dpc, 0); > > > + init_waitqueue_head(&fd_wait_queue); > > > + > > > + if (mshv_vtl_get_vsm_regs()) { > > > + pr_emerg("%s: Unable to get VSM capabilities !!\n", __func__); > > > + BUG(); > > > + } > > > > > > So you crash the whole kernel if someone loads this module on a non-mshv > > system? > > > > That seems quite excessive and hostile :( > > > > I agree this can be more lenient. It should be safe to just return an > error code (ENODEV) and let user space decide the next action. > > Saurabh, let me know what you think. Thanks for reporting this. I agree, returning an error makes more sense here. We should do this for both BUG() in this init. - Saurabh > > Thanks, > Wei. > > > Or am I somehow reading this incorrectly? > > > > thanks, > > > > greg k-h > >
Resend in plain text instead of HTML - oops! On 9/23/2023 12:58 AM, Greg KH wrote: > On Fri, Sep 22, 2023 at 11:38:35AM -0700, Nuno Das Neves wrote: >> +static int mshv_vtl_get_vsm_regs(void) >> +{ >> + struct hv_register_assoc registers[2]; >> + union hv_input_vtl input_vtl; >> + int ret, count = 2; >> + >> + input_vtl.as_uint8 = 0; >> + registers[0].name = HV_REGISTER_VSM_CODE_PAGE_OFFSETS; >> + registers[1].name = HV_REGISTER_VSM_CAPABILITIES; >> + >> + ret = hv_call_get_vp_registers(HV_VP_INDEX_SELF, HV_PARTITION_ID_SELF, >> + count, input_vtl, registers); >> + if (ret) >> + return ret; >> + >> + mshv_vsm_page_offsets.as_uint64 = registers[0].value.reg64; >> + mshv_vsm_capabilities.as_uint64 = registers[1].value.reg64; >> + >> + pr_debug("%s: VSM code page offsets: %#016llx\n", __func__, >> + mshv_vsm_page_offsets.as_uint64); >> + pr_info("%s: VSM capabilities: %#016llx\n", __func__, >> + mshv_vsm_capabilities.as_uint64); > > When drivers are working properly, they are quiet. This is very noisy > and probably is leaking memory addresses to userspace? > I will remove these, thanks. > Also, there is NEVER a need for __func__ in a pr_debug() line, it has > that for you automatically. > Thank you, I didn't know this. > Also, drivers should never call pr_*() calls, always use the proper > dev_*() calls instead. > We only use struct device in one place in this driver, I think that is the only place it makes sense to use dev_*() over pr_*() calls. > > >> + >> + return ret; >> +} >> + >> +static int mshv_vtl_configure_vsm_partition(void) >> +{ >> + union hv_register_vsm_partition_config config; >> + struct hv_register_assoc reg_assoc; >> + union hv_input_vtl input_vtl; >> + >> + config.as_u64 = 0; >> + config.default_vtl_protection_mask = HV_MAP_GPA_PERMISSIONS_MASK; >> + config.enable_vtl_protection = 1; >> + config.zero_memory_on_reset = 1; >> + config.intercept_vp_startup = 1; >> + config.intercept_cpuid_unimplemented = 1; >> + >> + if (mshv_vsm_capabilities.intercept_page_available) { >> + pr_debug("%s: using intercept page", __func__); > > Again, __func__ is not needed, you are providing it twice here for no > real reason except to waste storage space :) > Thanks, I will review all the uses of pr_debug(). >> + config.intercept_page = 1; >> + } >> + >> + reg_assoc.name = HV_REGISTER_VSM_PARTITION_CONFIG; >> + reg_assoc.value.reg64 = config.as_u64; >> + input_vtl.as_uint8 = 0; >> + >> + return hv_call_set_vp_registers(HV_VP_INDEX_SELF, HV_PARTITION_ID_SELF, >> + 1, input_vtl, ®_assoc); > > > None of this needs to be unwound if initialization fails later on? > I think unwinding this is not needed, not 100% sure. Saurabh, can you comment? Thanks, Nuno > thanks, > > greg k-h
On Mon, Sep 25, 2023 at 05:07:24PM -0700, Nuno Das Neves wrote: > On 9/23/2023 12:58 AM, Greg KH wrote: > > Also, drivers should never call pr_*() calls, always use the proper > > dev_*() calls instead. > > > > We only use struct device in one place in this driver, I think that is the > only place it makes sense to use dev_*() over pr_*() calls. Then the driver needs to be fixed to use struct device properly so that you do have access to it when you want to print messages. That's a valid reason to pass around your device structure when needed. thanks, greg k-h
On Tue, Sep 26, 2023 at 06:52:46AM +0200, Greg KH wrote: > On Mon, Sep 25, 2023 at 05:07:24PM -0700, Nuno Das Neves wrote: > > On 9/23/2023 12:58 AM, Greg KH wrote: > > > Also, drivers should never call pr_*() calls, always use the proper > > > dev_*() calls instead. > > > > > > > We only use struct device in one place in this driver, I think that is the > > only place it makes sense to use dev_*() over pr_*() calls. > > Then the driver needs to be fixed to use struct device properly so that > you do have access to it when you want to print messages. That's a > valid reason to pass around your device structure when needed. Greg, ACRN and Nitro drivers do not pass around the device structure. Instead, they rely on a global struct device. We can follow the same. Nuno, I checked our code. We already have a misc device. We can use that for dev_* calls. Something like this: dev_warn(mshv_dev.this_device, "this is a warning message"); This should resolve Greg's concern. Thanks, Wei. > > thanks, > > greg k-h >
On Tue, Sep 26, 2023 at 05:54:34AM +0000, Wei Liu wrote: > On Tue, Sep 26, 2023 at 06:52:46AM +0200, Greg KH wrote: > > On Mon, Sep 25, 2023 at 05:07:24PM -0700, Nuno Das Neves wrote: > > > On 9/23/2023 12:58 AM, Greg KH wrote: > > > > Also, drivers should never call pr_*() calls, always use the proper > > > > dev_*() calls instead. > > > > > > > > > > We only use struct device in one place in this driver, I think that is the > > > only place it makes sense to use dev_*() over pr_*() calls. > > > > Then the driver needs to be fixed to use struct device properly so that > > you do have access to it when you want to print messages. That's a > > valid reason to pass around your device structure when needed. > > Greg, ACRN and Nitro drivers do not pass around the device structure. > Instead, they rely on a global struct device. We can follow the same. A single global struct device is wrong, please don't do that. Don't copy bad design patterns from other drivers, be better :) thanks, greg k-h
On Tue, Sep 26, 2023 at 08:31:03AM +0200, Greg KH wrote: > On Tue, Sep 26, 2023 at 05:54:34AM +0000, Wei Liu wrote: > > On Tue, Sep 26, 2023 at 06:52:46AM +0200, Greg KH wrote: > > > On Mon, Sep 25, 2023 at 05:07:24PM -0700, Nuno Das Neves wrote: > > > > On 9/23/2023 12:58 AM, Greg KH wrote: > > > > > Also, drivers should never call pr_*() calls, always use the proper > > > > > dev_*() calls instead. > > > > > > > > > > > > > We only use struct device in one place in this driver, I think that is the > > > > only place it makes sense to use dev_*() over pr_*() calls. > > > > > > Then the driver needs to be fixed to use struct device properly so that > > > you do have access to it when you want to print messages. That's a > > > valid reason to pass around your device structure when needed. > > > > Greg, ACRN and Nitro drivers do not pass around the device structure. > > Instead, they rely on a global struct device. We can follow the same. > > A single global struct device is wrong, please don't do that. > > Don't copy bad design patterns from other drivers, be better :) > If we're working with real devices like network cards or graphics cards I would agree -- it is easy to imagine that we have several cards of the same model in the system -- but in real world there won't be two hypervisor instances running on the same hardware. We can stash the struct device inside some private data fields, but that doesn't change the fact that we're still having one instance of the structure. Is this what you want? Or do you have something else in mind? Thanks, Wei. > thanks, > > greg k-h >
On Tue, Sep 26, 2023 at 07:00:51AM +0000, Wei Liu wrote: > On Tue, Sep 26, 2023 at 08:31:03AM +0200, Greg KH wrote: > > On Tue, Sep 26, 2023 at 05:54:34AM +0000, Wei Liu wrote: > > > On Tue, Sep 26, 2023 at 06:52:46AM +0200, Greg KH wrote: > > > > On Mon, Sep 25, 2023 at 05:07:24PM -0700, Nuno Das Neves wrote: > > > > > On 9/23/2023 12:58 AM, Greg KH wrote: > > > > > > Also, drivers should never call pr_*() calls, always use the proper > > > > > > dev_*() calls instead. > > > > > > > > > > > > > > > > We only use struct device in one place in this driver, I think that is the > > > > > only place it makes sense to use dev_*() over pr_*() calls. > > > > > > > > Then the driver needs to be fixed to use struct device properly so that > > > > you do have access to it when you want to print messages. That's a > > > > valid reason to pass around your device structure when needed. > > > > > > Greg, ACRN and Nitro drivers do not pass around the device structure. > > > Instead, they rely on a global struct device. We can follow the same. > > > > A single global struct device is wrong, please don't do that. > > > > Don't copy bad design patterns from other drivers, be better :) > > > > If we're working with real devices like network cards or graphics cards > I would agree -- it is easy to imagine that we have several cards of the > same model in the system -- but in real world there won't be two > hypervisor instances running on the same hardware. > > We can stash the struct device inside some private data fields, but that > doesn't change the fact that we're still having one instance of the > structure. Is this what you want? Or do you have something else in mind? You have a real device, it's how userspace interacts with your subsystem. Please use that, it is dynamically created and handled and is the correct representation here. thanks, greg k-h
On Mon, Sep 25, 2023 at 05:07:24PM -0700, Nuno Das Neves wrote: > Resend in plain text instead of HTML - oops! > > On 9/23/2023 12:58 AM, Greg KH wrote: > >On Fri, Sep 22, 2023 at 11:38:35AM -0700, Nuno Das Neves wrote: > >>+static int mshv_vtl_get_vsm_regs(void) > >>+{ > >>+ struct hv_register_assoc registers[2]; > >>+ union hv_input_vtl input_vtl; > >>+ int ret, count = 2; > >>+ > >>+ input_vtl.as_uint8 = 0; > >>+ registers[0].name = HV_REGISTER_VSM_CODE_PAGE_OFFSETS; > >>+ registers[1].name = HV_REGISTER_VSM_CAPABILITIES; > >>+ > >>+ ret = hv_call_get_vp_registers(HV_VP_INDEX_SELF, HV_PARTITION_ID_SELF, > >>+ count, input_vtl, registers); > >>+ if (ret) > >>+ return ret; > >>+ > >>+ mshv_vsm_page_offsets.as_uint64 = registers[0].value.reg64; > >>+ mshv_vsm_capabilities.as_uint64 = registers[1].value.reg64; > >>+ > >>+ pr_debug("%s: VSM code page offsets: %#016llx\n", __func__, > >>+ mshv_vsm_page_offsets.as_uint64); > >>+ pr_info("%s: VSM capabilities: %#016llx\n", __func__, > >>+ mshv_vsm_capabilities.as_uint64); > > > >When drivers are working properly, they are quiet. This is very noisy > >and probably is leaking memory addresses to userspace? > > > > I will remove these, thanks. > > >Also, there is NEVER a need for __func__ in a pr_debug() line, it has > >that for you automatically. > > > > Thank you, I didn't know this. > > >Also, drivers should never call pr_*() calls, always use the proper > >dev_*() calls instead. > > > > We only use struct device in one place in this driver, I think that > is the only place it makes sense to use dev_*() over pr_*() calls. > > > > > >>+ > >>+ return ret; > >>+} > >>+ > >>+static int mshv_vtl_configure_vsm_partition(void) > >>+{ > >>+ union hv_register_vsm_partition_config config; > >>+ struct hv_register_assoc reg_assoc; > >>+ union hv_input_vtl input_vtl; > >>+ > >>+ config.as_u64 = 0; > >>+ config.default_vtl_protection_mask = HV_MAP_GPA_PERMISSIONS_MASK; > >>+ config.enable_vtl_protection = 1; > >>+ config.zero_memory_on_reset = 1; > >>+ config.intercept_vp_startup = 1; > >>+ config.intercept_cpuid_unimplemented = 1; > >>+ > >>+ if (mshv_vsm_capabilities.intercept_page_available) { > >>+ pr_debug("%s: using intercept page", __func__); > > > >Again, __func__ is not needed, you are providing it twice here for no > >real reason except to waste storage space :) > > > > Thanks, I will review all the uses of pr_debug(). > > >>+ config.intercept_page = 1; > >>+ } > >>+ > >>+ reg_assoc.name = HV_REGISTER_VSM_PARTITION_CONFIG; > >>+ reg_assoc.value.reg64 = config.as_u64; > >>+ input_vtl.as_uint8 = 0; > >>+ > >>+ return hv_call_set_vp_registers(HV_VP_INDEX_SELF, HV_PARTITION_ID_SELF, > >>+ 1, input_vtl, ®_assoc); > > > > > >None of this needs to be unwound if initialization fails later on? > > > > I think unwinding this is not needed, not 100% sure. > Saurabh, can you comment? Yes unwinding is not useful here, as these are synthetic register and there is no other use case of VSM supporting platforms other than VSM configuration. In a non VSM supported platform hv_call_set_vp_registers itself will fail for HV_REGISTER_VSM_PARTITION_CONFIG. - Saurabh > > Thanks, > Nuno > > >thanks, > > > >greg k-h >
On 9/26/2023 1:03 AM, Greg KH wrote: > On Tue, Sep 26, 2023 at 07:00:51AM +0000, Wei Liu wrote: >> On Tue, Sep 26, 2023 at 08:31:03AM +0200, Greg KH wrote: >>> On Tue, Sep 26, 2023 at 05:54:34AM +0000, Wei Liu wrote: >>>> On Tue, Sep 26, 2023 at 06:52:46AM +0200, Greg KH wrote: >>>>> On Mon, Sep 25, 2023 at 05:07:24PM -0700, Nuno Das Neves wrote: >>>>>> On 9/23/2023 12:58 AM, Greg KH wrote: >>>>>>> Also, drivers should never call pr_*() calls, always use the proper >>>>>>> dev_*() calls instead. >>>>>>> >>>>>> >>>>>> We only use struct device in one place in this driver, I think that is the >>>>>> only place it makes sense to use dev_*() over pr_*() calls. >>>>> >>>>> Then the driver needs to be fixed to use struct device properly so that >>>>> you do have access to it when you want to print messages. That's a >>>>> valid reason to pass around your device structure when needed. >>>> What is the tangible benefit of using dev_*() over pr_*()? As I said, our use of struct device is very limited compared to all the places we may need to log errors. pr_*() is used by many, many drivers; it seems to be the norm. We can certainly add a pr_fmt to improve the logging. >>>> Greg, ACRN and Nitro drivers do not pass around the device structure. >>>> Instead, they rely on a global struct device. We can follow the same. >>> >>> A single global struct device is wrong, please don't do that. >>> >>> Don't copy bad design patterns from other drivers, be better :) >>> What makes it a bad pattern? It seems to be well-established, and is also used by KVM which this driver is loosely modeled after: https://elixir.bootlin.com/linux/v6.5.5/source/virt/kvm/kvm_main.c#L5128 >> >> If we're working with real devices like network cards or graphics cards >> I would agree -- it is easy to imagine that we have several cards of the >> same model in the system -- but in real world there won't be two >> hypervisor instances running on the same hardware. >> >> We can stash the struct device inside some private data fields, but that >> doesn't change the fact that we're still having one instance of the >> structure. Is this what you want? Or do you have something else in mind? > > You have a real device, it's how userspace interacts with your > subsystem. Please use that, it is dynamically created and handled and > is the correct representation here. > Are you referring to the struct device we get from calling misc_register? If not, please be more specific. How would you suggest we get a reference to that device via e.g. open() or ioctl() without keeping a global reference to it? Thanks, Nuno
On Tue, Sep 26, 2023 at 02:52:36PM -0700, Nuno Das Neves wrote: > On 9/26/2023 1:03 AM, Greg KH wrote: > > On Tue, Sep 26, 2023 at 07:00:51AM +0000, Wei Liu wrote: > > > On Tue, Sep 26, 2023 at 08:31:03AM +0200, Greg KH wrote: > > > > On Tue, Sep 26, 2023 at 05:54:34AM +0000, Wei Liu wrote: > > > > > On Tue, Sep 26, 2023 at 06:52:46AM +0200, Greg KH wrote: > > > > > > On Mon, Sep 25, 2023 at 05:07:24PM -0700, Nuno Das Neves wrote: > > > > > > > On 9/23/2023 12:58 AM, Greg KH wrote: > > > > > > > > Also, drivers should never call pr_*() calls, always use the proper > > > > > > > > dev_*() calls instead. > > > > > > > > > > > > > > > > > > > > > > We only use struct device in one place in this driver, I think that is the > > > > > > > only place it makes sense to use dev_*() over pr_*() calls. > > > > > > > > > > > > Then the driver needs to be fixed to use struct device properly so that > > > > > > you do have access to it when you want to print messages. That's a > > > > > > valid reason to pass around your device structure when needed. > > > > > > > What is the tangible benefit of using dev_*() over pr_*()? Unified reporting and handling of userspace of kernel log messages so they can be classified properly as well as dealing correctly with the dynamic debugging kernel infrastructure. Why wouldn't you want to use it? > As I said, > our use of struct device is very limited compared to all the places we > may need to log errors. Then please fix that. > pr_*() is used by many, many drivers; it seems to be the norm. Not at all, it is not. > We can certainly add a pr_fmt to improve the logging. Please do it correctly so you don't have to go and add support for it later when your tools people ask you why they can't properly parse your driver's kernel log messages. > > > If we're working with real devices like network cards or graphics cards > > > I would agree -- it is easy to imagine that we have several cards of the > > > same model in the system -- but in real world there won't be two > > > hypervisor instances running on the same hardware. > > > > > > We can stash the struct device inside some private data fields, but that > > > doesn't change the fact that we're still having one instance of the > > > structure. Is this what you want? Or do you have something else in mind? > > > > You have a real device, it's how userspace interacts with your > > subsystem. Please use that, it is dynamically created and handled and > > is the correct representation here. > > > > Are you referring to the struct device we get from calling > misc_register? Yes. > How would you suggest we get a reference to that device via e.g. open() > or ioctl() without keeping a global reference to it? You explicitly have it in your open() and ioctl() call, you never need a global reference for it the kernel gives it to you! thanks, greg k-h
Hi Greg It is past midnight here, so I may not be able to articulate my thoughts very well. I want to avoid waiting for another day for another round trip of emails though. We can look at your reply in the morning and reply again. On Wed, Sep 27, 2023 at 08:01:01AM +0200, Greg KH wrote: [...] > > > > If we're working with real devices like network cards or graphics cards > > > > I would agree -- it is easy to imagine that we have several cards of the > > > > same model in the system -- but in real world there won't be two > > > > hypervisor instances running on the same hardware. > > > > > > > > We can stash the struct device inside some private data fields, but that > > > > doesn't change the fact that we're still having one instance of the > > > > structure. Is this what you want? Or do you have something else in mind? > > > > > > You have a real device, it's how userspace interacts with your > > > subsystem. Please use that, it is dynamically created and handled and > > > is the correct representation here. > > > > > > > Are you referring to the struct device we get from calling > > misc_register? > > Yes. > We know about this, please see below. And we plan to use this. > > How would you suggest we get a reference to that device via e.g. open() > > or ioctl() without keeping a global reference to it? > > You explicitly have it in your open() and ioctl() call, you never need a > global reference for it the kernel gives it to you! > This is what I don't follow. Nuno and I discussed this today offline. We looked at the code before and looked again today (well, yesterday now). Here are the two functions: int vfs_open(const struct path *path, struct file *file) long vfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) Or, if we provide an open function in our file_operations struct, we get an additional struct inode pointer. int (*open) (struct inode *, struct file *); Neither struct file nor struct inode contains a reference to struct device. Then in vfs.rst, there is a section about open: ``open`` called by the VFS when an inode should be opened. When the VFS opens a file, it creates a new "struct file". It then calls the open method for the newly allocated file structure. You might think that the open method really belongs in "struct inode_operations", and you may be right. I think it's done the way it is because it makes filesystems simpler to implement. The open() method is a good place to initialize the "private_data" member in the file structure if you want to point to a device structure So, the driver is supposed to stash a pointer to struct device in private_data. That's what I alluded to in my previous reply. The core driver framework or the VFS doesn't give us a reference to struct device. We have to do it ourselves. We can do that for sure, but the struct device we stash into private_data is going to be the one that is returned from misc_register, which at the same time is already stashed inside a static variable in our driver by our own code (Note that this is a pervasive pattern in the kernel). I hope this is clear. If we're missing something extremely obvious, that somehow we can get a reference to struct device from the VFS while opening the file or doing ioctl without stashing it ourselves in our own code, please let us know. At this point I feel like either I'm extremely stupid or we're just talking past each other. If you tell me it is the former and help me understand how we can achieve what you described, I am more than happy to learn new things I don't know or understand. :-) If we have to propagate that reference ourselves, that then leads to next question whether it will just be more convenient to use the stashed value in the static variable directly like other drivers do, instead of stashing and propagating it around, knowing 100% it is the same object. I don't feel too strongly about this. As long as we are on the same page about getting a reference to struct device, we can do it either way. Thanks, Wei. > thanks, > > greg k-h >
On Wed, Sep 27, 2023 at 08:04:42AM +0000, Wei Liu wrote: > On Wed, Sep 27, 2023 at 08:01:01AM +0200, Greg KH wrote: > [...] > > > > > If we're working with real devices like network cards or graphics cards > > > > > I would agree -- it is easy to imagine that we have several cards of the > > > > > same model in the system -- but in real world there won't be two > > > > > hypervisor instances running on the same hardware. > > > > > > > > > > We can stash the struct device inside some private data fields, but that > > > > > doesn't change the fact that we're still having one instance of the > > > > > structure. Is this what you want? Or do you have something else in mind? > > > > > > > > You have a real device, it's how userspace interacts with your > > > > subsystem. Please use that, it is dynamically created and handled and > > > > is the correct representation here. > > > > > > > > > > Are you referring to the struct device we get from calling > > > misc_register? > > > > Yes. > > > > We know about this, please see below. And we plan to use this. > > > > How would you suggest we get a reference to that device via e.g. open() > > > or ioctl() without keeping a global reference to it? > > > > You explicitly have it in your open() and ioctl() call, you never need a > > global reference for it the kernel gives it to you! > > > > This is what I don't follow. > > Nuno and I discussed this today offline. We looked at the code before > and looked again today (well, yesterday now). > > Here are the two functions: > > int vfs_open(const struct path *path, struct file *file) > long vfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) > > Or, if we provide an open function in our file_operations struct, we get > an additional struct inode pointer. > > int (*open) (struct inode *, struct file *); > > Neither struct file nor struct inode contains a reference to struct device. > > Then in vfs.rst, there is a section about open: > > ``open`` > called by the VFS when an inode should be opened. When the VFS > opens a file, it creates a new "struct file". It then calls the > open method for the newly allocated file structure. You might > think that the open method really belongs in "struct > inode_operations", and you may be right. I think it's done the > way it is because it makes filesystems simpler to implement. > The open() method is a good place to initialize the > "private_data" member in the file structure if you want to point > to a device structure > > So, the driver is supposed to stash a pointer to struct device in > private_data. That's what I alluded to in my previous reply. The core > driver framework or the VFS doesn't give us a reference to struct > device. We have to do it ourselves. Please read Linux Device Drivers, 3rd edition, chapter 3, for how to do this properly. The book is free online. Also look at the zillion in-kernel example drivers that use the misc device api, container_of() is your friend... > We can do that for sure, but the struct device we stash into > private_data is going to be the one that is returned from misc_register, > which at the same time is already stashed inside a static variable in > our driver by our own code (Note that this is a pervasive pattern in the > kernel). Again, don't make this static, there's no requirement to do so. But even if you do, sure, use it this way, you have a device. But I would strongly discourage you from having a static variable, there is no need to do this at all, and no one else should do so either. thanks, greg k-h
On 9/27/2023 1:33 AM, Greg KH wrote: > On Wed, Sep 27, 2023 at 08:04:42AM +0000, Wei Liu wrote: >> So, the driver is supposed to stash a pointer to struct device in >> private_data. That's what I alluded to in my previous reply. The core >> driver framework or the VFS doesn't give us a reference to struct >> device. We have to do it ourselves. > > Please read Linux Device Drivers, 3rd edition, chapter 3, for how to do > this properly. The book is free online. > Thanks, the issue that confused us was how to get the miscdevice. I eventually found the answer in the misc_register() documentation: "By default, an open() syscall to the device sets file->private_data to point to the structure." That's good - when we create a guest, we will have the miscdevice in private_data already. Then we can just put it in our per-guest data structure. That will let us retrieve the device in the other ioctls so we can call dev_*(). Thanks, Nuno