Message ID | 20250226200612.2062-3-mhklinux@outlook.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Delegated to: | Krzysztof WilczyĆski |
Headers | show |
Series | hyperv: Introduce new way to manage hypercall args | expand |
On 2/26/2025 12:06 PM, mhkelley58@gmail.com wrote: > From: Michael Kelley <mhklinux@outlook.com> > > Current code allocates the "hyperv_pcpu_input_arg", and in > some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB > page of memory allocated per-vCPU. A hypercall call site disables > interrupts, then uses this memory to set up the input parameters for > the hypercall, read the output results after hypercall execution, and > re-enable interrupts. The open coding of these steps leads to > inconsistencies, and in some cases, violation of the generic > requirements for the hypercall input and output as described in the > Hyper-V Top Level Functional Spec (TLFS)[1]. > > To reduce these kinds of problems, introduce a family of inline > functions to replace the open coding. The functions provide a new way > to manage the use of this per-vCPU memory that is usually the input and > output arguments to Hyper-V hypercalls. The functions encapsulate > key aspects of the usage and ensure that the TLFS requirements are > met (max size of 1 page each for input and output, no overlap of > input and output, aligned to 8 bytes, etc.). Conceptually, there > is no longer a difference between the "per-vCPU input page" and > "per-vCPU output page". Only a single per-vCPU page is allocated, and > it provides both hypercall input and output memory. All current > hypercalls can fit their input and output within that single page, > though the new code allows easy changing to two pages should a future > hypercall require a full page for each of the input and output. > > The new functions always zero the fixed-size portion of the hypercall > input area so that uninitialized memory is not inadvertently passed > to the hypercall. Current open-coded hypercall call sites are > inconsistent on this point, and use of the new functions addresses > that inconsistency. The output area is not zero'ed by the new code > as it is Hyper-V's responsibility to provide legal output. > > When the input or output (or both) contain an array, the new functions > calculate and return how many array entries fit within the per-cpu > memory page, which is effectively the "batch size" for the hypercall > processing multiple entries. This batch size can then be used in the > hypercall control word to specify the repetition count. This > calculation of the batch size replaces current open coding of the > batch size, which is prone to errors. Note that the array portion of > the input area is *not* zero'ed. The arrays are almost always 64-bit > GPAs or something similar, and zero'ing that much memory seems > wasteful at runtime when it will all be overwritten. The hypercall > call site is responsible for ensuring that no part of the array is > left uninitialized (just as with current code). > > The new functions are realized as a single inline function that > handles the most complex case, which is a hypercall with input > and output, both of which contain arrays. Simpler cases are mapped to > this most complex case with #define wrappers that provide zero or NULL > for some arguments. Several of the arguments to this new function are > expected to be compile-time constants generated by "sizeof()" > expressions. As such, most of the code in the new function can be > evaluated by the compiler, with the result that the code paths are > no longer than with the current open coding. The one exception is > new code generated to zero the fixed-size portion of the input area > in cases where it is not currently done. > > [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs > > Signed-off-by: Michael Kelley <mhklinux@outlook.com> > --- > include/asm-generic/mshyperv.h | 102 +++++++++++++++++++++++++++++++++ > 1 file changed, 102 insertions(+) > > diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h > index b13b0cda4ac8..0c8a9133cf1a 100644 > --- a/include/asm-generic/mshyperv.h > +++ b/include/asm-generic/mshyperv.h > @@ -135,6 +135,108 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size, > return status; > } > > +/* > + * Hypercall input and output argument setup > + */ > + > +/* Temporary mapping to be removed at the end of the patch series */ > +#define hyperv_pcpu_arg hyperv_pcpu_input_arg > + > +/* > + * Allocate one page that is shared between input and output args, which is > + * sufficient for all current hypercalls. If a future hypercall requires > + * more space, change this value to "2" and everything will work. > + */ > +#define HV_HVCALL_ARG_PAGES 1 > + > +/* > + * Allocate space for hypercall input and output arguments from the > + * pre-allocated per-cpu hyperv_pcpu_args page(s). A NULL value for the input > + * or output indicates to allocate no space for that argument. For input and > + * for output, specify the size of the fixed portion, and the size of an > + * element in a variable size array. A zero value for entry_size indicates > + * there is no array. The fixed size space for the input is zero'ed. > + * It might be worth explicitly mentioning that interrupts should be disabled when calling this function. > + * When variable size arrays are present, the function returns the number of > + * elements (i.e, the batch size) that fit in the available space. > + * > + * The four "size" arguments are expected to be constants, in which case the > + * compiler does most of the calculations. Then the generated inline code is no > + * larger than if open coding the access to the hyperv_pcpu_arg and doing > + * memset() on the input. > + */ > +static inline int hv_hvcall_inout_array( > + void *input, u32 in_size, u32 in_entry_size, > + void *output, u32 out_size, u32 out_entry_size) Is there a reason input and output are void * instead of void ** here? > +{ > + u32 in_batch_count = 0, out_batch_count = 0, batch_count; > + u32 in_total_size, out_total_size, offset; > + u32 batch_space = HV_HYP_PAGE_SIZE * HV_HVCALL_ARG_PAGES; > + void *space; > + > + /* > + * If input and output have arrays, allocate half the space to input > + * and half to output. If only input has an array, the array can use > + * all the space except for the fixed size output (but not to exceed > + * one page), and vice versa. > + */ > + if (in_entry_size && out_entry_size) > + batch_space = batch_space / 2; > + else if (in_entry_size) > + batch_space = min(HV_HYP_PAGE_SIZE, batch_space - out_size); > + else if (out_entry_size) > + batch_space = min(HV_HYP_PAGE_SIZE, batch_space - in_size); > + > + if (in_entry_size) > + in_batch_count = (batch_space - in_size) / in_entry_size; > + if (out_entry_size) > + out_batch_count = (batch_space - out_size) / out_entry_size; > + > + /* > + * If input and output have arrays, use the smaller of the two batch > + * counts, in case they are different. If only one has an array, use > + * that batch count. batch_count will be zero if neither has an array. > + */ > + if (in_batch_count && out_batch_count) > + batch_count = min(in_batch_count, out_batch_count); > + else > + batch_count = in_batch_count | out_batch_count; > + > + in_total_size = ALIGN(in_size + (in_entry_size * batch_count), 8); > + out_total_size = ALIGN(out_size + (out_entry_size * batch_count), 8); > + > + space = *this_cpu_ptr(hyperv_pcpu_arg); > + if (input) { > + *(void **)input = space; > + if (space) > + /* Zero the fixed size portion, not any array portion */ > + memset(space, 0, ALIGN(in_size, 8)); > + } > + > + if (output) { > + if (in_total_size + out_total_size <= HV_HYP_PAGE_SIZE) { > + offset = in_total_size; > + } else { > + offset = HV_HYP_PAGE_SIZE; > + /* Need more than 1 page, but only 1 was allocated */ > + BUILD_BUG_ON(HV_HVCALL_ARG_PAGES == 1); Interesting... so the compiler is not compiling this BUILD_BUG_ON in your patchset because in_total_size + out_total_size <= HV_HYP_PAGE_SIZE is always known at compile-time? So will this also fail if any of the args in_size, in_entry_size, out_size, out_entry_size are runtime-known? Nuno > + } > + *(void **)output = space + offset; > + } > + > + return batch_count; > +} > + > +/* Wrappers for some of the simpler cases with only input, or with no arrays */ > +#define hv_hvcall_in(input, in_size) \ > + hv_hvcall_inout_array(input, in_size, 0, NULL, 0, 0) > + > +#define hv_hvcall_inout(input, in_size, output, out_size) \ > + hv_hvcall_inout_array(input, in_size, 0, output, out_size, 0) > + > +#define hv_hvcall_in_array(input, in_size, in_entry_size) \ > + hv_hvcall_inout_array(input, in_size, in_entry_size, NULL, 0, 0) > + > /* Generate the guest OS identifier as described in the Hyper-V TLFS */ > static inline u64 hv_generate_guest_id(u64 kernel_version) > {
From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Wednesday, February 26, 2025 4:15 PM > > On 2/26/2025 12:06 PM, mhkelley58@gmail.com wrote: > > From: Michael Kelley <mhklinux@outlook.com> > > > > Current code allocates the "hyperv_pcpu_input_arg", and in > > some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB > > page of memory allocated per-vCPU. A hypercall call site disables > > interrupts, then uses this memory to set up the input parameters for > > the hypercall, read the output results after hypercall execution, and > > re-enable interrupts. The open coding of these steps leads to > > inconsistencies, and in some cases, violation of the generic > > requirements for the hypercall input and output as described in the > > Hyper-V Top Level Functional Spec (TLFS)[1]. > > > > To reduce these kinds of problems, introduce a family of inline > > functions to replace the open coding. The functions provide a new way > > to manage the use of this per-vCPU memory that is usually the input and > > output arguments to Hyper-V hypercalls. The functions encapsulate > > key aspects of the usage and ensure that the TLFS requirements are > > met (max size of 1 page each for input and output, no overlap of > > input and output, aligned to 8 bytes, etc.). Conceptually, there > > is no longer a difference between the "per-vCPU input page" and > > "per-vCPU output page". Only a single per-vCPU page is allocated, and > > it provides both hypercall input and output memory. All current > > hypercalls can fit their input and output within that single page, > > though the new code allows easy changing to two pages should a future > > hypercall require a full page for each of the input and output. > > > > The new functions always zero the fixed-size portion of the hypercall > > input area so that uninitialized memory is not inadvertently passed > > to the hypercall. Current open-coded hypercall call sites are > > inconsistent on this point, and use of the new functions addresses > > that inconsistency. The output area is not zero'ed by the new code > > as it is Hyper-V's responsibility to provide legal output. > > > > When the input or output (or both) contain an array, the new functions > > calculate and return how many array entries fit within the per-cpu > > memory page, which is effectively the "batch size" for the hypercall > > processing multiple entries. This batch size can then be used in the > > hypercall control word to specify the repetition count. This > > calculation of the batch size replaces current open coding of the > > batch size, which is prone to errors. Note that the array portion of > > the input area is *not* zero'ed. The arrays are almost always 64-bit > > GPAs or something similar, and zero'ing that much memory seems > > wasteful at runtime when it will all be overwritten. The hypercall > > call site is responsible for ensuring that no part of the array is > > left uninitialized (just as with current code). > > > > The new functions are realized as a single inline function that > > handles the most complex case, which is a hypercall with input > > and output, both of which contain arrays. Simpler cases are mapped to > > this most complex case with #define wrappers that provide zero or NULL > > for some arguments. Several of the arguments to this new function are > > expected to be compile-time constants generated by "sizeof()" > > expressions. As such, most of the code in the new function can be > > evaluated by the compiler, with the result that the code paths are > > no longer than with the current open coding. The one exception is > > new code generated to zero the fixed-size portion of the input area > > in cases where it is not currently done. > > > > [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs > > > > Signed-off-by: Michael Kelley <mhklinux@outlook.com> > > --- > > include/asm-generic/mshyperv.h | 102 +++++++++++++++++++++++++++++++++ > > 1 file changed, 102 insertions(+) > > > > diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h > > index b13b0cda4ac8..0c8a9133cf1a 100644 > > --- a/include/asm-generic/mshyperv.h > > +++ b/include/asm-generic/mshyperv.h > > @@ -135,6 +135,108 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size, > > return status; > > } > > > > +/* > > + * Hypercall input and output argument setup > > + */ > > + > > +/* Temporary mapping to be removed at the end of the patch series */ > > +#define hyperv_pcpu_arg hyperv_pcpu_input_arg > > + > > +/* > > + * Allocate one page that is shared between input and output args, which is > > + * sufficient for all current hypercalls. If a future hypercall requires > > + * more space, change this value to "2" and everything will work. > > + */ > > +#define HV_HVCALL_ARG_PAGES 1 > > + > > +/* > > + * Allocate space for hypercall input and output arguments from the > > + * pre-allocated per-cpu hyperv_pcpu_args page(s). A NULL value for the input > > + * or output indicates to allocate no space for that argument. For input and > > + * for output, specify the size of the fixed portion, and the size of an > > + * element in a variable size array. A zero value for entry_size indicates > > + * there is no array. The fixed size space for the input is zero'ed. > > + * > It might be worth explicitly mentioning that interrupts should be disabled when > calling this function. Agreed. > > > + * When variable size arrays are present, the function returns the number of > > + * elements (i.e, the batch size) that fit in the available space. > > + * > > + * The four "size" arguments are expected to be constants, in which case the > > + * compiler does most of the calculations. Then the generated inline code is no > > + * larger than if open coding the access to the hyperv_pcpu_arg and doing > > + * memset() on the input. > > + */ > > +static inline int hv_hvcall_inout_array( > > + void *input, u32 in_size, u32 in_entry_size, > > + void *output, u32 out_size, u32 out_entry_size) > Is there a reason input and output are void * instead of void ** here? Yes -- it must be void *, and not void **. Consider a function like get_vtl() in Patch 3 of this series where local variable "input" is declared as: struct hv_input_get_vp_registers *input; Then the first argument to hv_hvcall_inout() will be of type struct hv_input_get_vp_registers **. The compiler does not consider this to match void ** and would generate an error. But struct hv_input_get_vp_registers ** _does_ match void * and compiles with no error. It makes sense when you think about it, though it isn't obvious. I tried to use void ** initially and had to figure out why the code wouldn't compile. :-) > > > +{ > > + u32 in_batch_count = 0, out_batch_count = 0, batch_count; > > + u32 in_total_size, out_total_size, offset; > > + u32 batch_space = HV_HYP_PAGE_SIZE * HV_HVCALL_ARG_PAGES; > > + void *space; > > + > > + /* > > + * If input and output have arrays, allocate half the space to input > > + * and half to output. If only input has an array, the array can use > > + * all the space except for the fixed size output (but not to exceed > > + * one page), and vice versa. > > + */ > > + if (in_entry_size && out_entry_size) > > + batch_space = batch_space / 2; > > + else if (in_entry_size) > > + batch_space = min(HV_HYP_PAGE_SIZE, batch_space - out_size); > > + else if (out_entry_size) > > + batch_space = min(HV_HYP_PAGE_SIZE, batch_space - in_size); > > + > > + if (in_entry_size) > > + in_batch_count = (batch_space - in_size) / in_entry_size; > > + if (out_entry_size) > > + out_batch_count = (batch_space - out_size) / out_entry_size; > > + > > + /* > > + * If input and output have arrays, use the smaller of the two batch > > + * counts, in case they are different. If only one has an array, use > > + * that batch count. batch_count will be zero if neither has an array. > > + */ > > + if (in_batch_count && out_batch_count) > > + batch_count = min(in_batch_count, out_batch_count); > > + else > > + batch_count = in_batch_count | out_batch_count; > > + > > + in_total_size = ALIGN(in_size + (in_entry_size * batch_count), 8); > > + out_total_size = ALIGN(out_size + (out_entry_size * batch_count), 8); > > + > > + space = *this_cpu_ptr(hyperv_pcpu_arg); > > + if (input) { > > + *(void **)input = space; > > + if (space) > > + /* Zero the fixed size portion, not any array portion */ > > + memset(space, 0, ALIGN(in_size, 8)); > > + } > > + > > + if (output) { > > + if (in_total_size + out_total_size <= HV_HYP_PAGE_SIZE) { > > + offset = in_total_size; > > + } else { > > + offset = HV_HYP_PAGE_SIZE; > > + /* Need more than 1 page, but only 1 was allocated */ > > + BUILD_BUG_ON(HV_HVCALL_ARG_PAGES == 1); > Interesting... so the compiler is not compiling this BUILD_BUG_ON in your patchset > because in_total_size + out_total_size <= HV_HYP_PAGE_SIZE is always known at > compile-time? Correct. And if for some future hypercall in_total_size + out_total_size is *not* <= HV_HYP_PAGE_SIZE, the BUILD_BUG_ON() will alert the developer to the problem. Depending on the argument requirements of this future hypercall, the solution might require changing HV_HVCALL_ARG_PAGES to 2. > So will this also fail if any of the args in_size, in_entry_size, out_size, > out_entry_size are runtime-known? Correct. I should change my wording about "The four 'size' arguments are expected to be constants" to ". . . must be constants". OTOH, if there's a need to support non-constants for any of these arguments, some additional code could handle that case. But the overall hv_hvcall_inout_array() function will end up generating a lot of code to execute at runtime. I looked at the hypercall call sites in the OHCL kernel tree, and didn't see any need for non-constants, but I haven't looked yet at the v4 patch series you just posted. Let me know if you have a case requiring non-constants. Michael
On 2/26/2025 5:50 PM, Michael Kelley wrote: > From: Nuno Das Neves <nunodasneves@linux.microsoft.com> Sent: Wednesday, February 26, 2025 4:15 PM >> >> On 2/26/2025 12:06 PM, mhkelley58@gmail.com wrote: >>> From: Michael Kelley <mhklinux@outlook.com> >>> >>> Current code allocates the "hyperv_pcpu_input_arg", and in >>> some configurations, the "hyperv_pcpu_output_arg". Each is a 4 KiB >>> page of memory allocated per-vCPU. A hypercall call site disables >>> interrupts, then uses this memory to set up the input parameters for >>> the hypercall, read the output results after hypercall execution, and >>> re-enable interrupts. The open coding of these steps leads to >>> inconsistencies, and in some cases, violation of the generic >>> requirements for the hypercall input and output as described in the >>> Hyper-V Top Level Functional Spec (TLFS)[1]. >>> >>> To reduce these kinds of problems, introduce a family of inline >>> functions to replace the open coding. The functions provide a new way >>> to manage the use of this per-vCPU memory that is usually the input and >>> output arguments to Hyper-V hypercalls. The functions encapsulate >>> key aspects of the usage and ensure that the TLFS requirements are >>> met (max size of 1 page each for input and output, no overlap of >>> input and output, aligned to 8 bytes, etc.). Conceptually, there >>> is no longer a difference between the "per-vCPU input page" and >>> "per-vCPU output page". Only a single per-vCPU page is allocated, and >>> it provides both hypercall input and output memory. All current >>> hypercalls can fit their input and output within that single page, >>> though the new code allows easy changing to two pages should a future >>> hypercall require a full page for each of the input and output. >>> >>> The new functions always zero the fixed-size portion of the hypercall >>> input area so that uninitialized memory is not inadvertently passed >>> to the hypercall. Current open-coded hypercall call sites are >>> inconsistent on this point, and use of the new functions addresses >>> that inconsistency. The output area is not zero'ed by the new code >>> as it is Hyper-V's responsibility to provide legal output. >>> >>> When the input or output (or both) contain an array, the new functions >>> calculate and return how many array entries fit within the per-cpu >>> memory page, which is effectively the "batch size" for the hypercall >>> processing multiple entries. This batch size can then be used in the >>> hypercall control word to specify the repetition count. This >>> calculation of the batch size replaces current open coding of the >>> batch size, which is prone to errors. Note that the array portion of >>> the input area is *not* zero'ed. The arrays are almost always 64-bit >>> GPAs or something similar, and zero'ing that much memory seems >>> wasteful at runtime when it will all be overwritten. The hypercall >>> call site is responsible for ensuring that no part of the array is >>> left uninitialized (just as with current code). >>> >>> The new functions are realized as a single inline function that >>> handles the most complex case, which is a hypercall with input >>> and output, both of which contain arrays. Simpler cases are mapped to >>> this most complex case with #define wrappers that provide zero or NULL >>> for some arguments. Several of the arguments to this new function are >>> expected to be compile-time constants generated by "sizeof()" >>> expressions. As such, most of the code in the new function can be >>> evaluated by the compiler, with the result that the code paths are >>> no longer than with the current open coding. The one exception is >>> new code generated to zero the fixed-size portion of the input area >>> in cases where it is not currently done. >>> >>> [1] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs >>> >>> Signed-off-by: Michael Kelley <mhklinux@outlook.com> >>> --- >>> include/asm-generic/mshyperv.h | 102 +++++++++++++++++++++++++++++++++ >>> 1 file changed, 102 insertions(+) >>> >>> diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h >>> index b13b0cda4ac8..0c8a9133cf1a 100644 >>> --- a/include/asm-generic/mshyperv.h >>> +++ b/include/asm-generic/mshyperv.h >>> @@ -135,6 +135,108 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size, >>> return status; >>> } >>> >>> +/* >>> + * Hypercall input and output argument setup >>> + */ >>> + >>> +/* Temporary mapping to be removed at the end of the patch series */ >>> +#define hyperv_pcpu_arg hyperv_pcpu_input_arg >>> + >>> +/* >>> + * Allocate one page that is shared between input and output args, which is >>> + * sufficient for all current hypercalls. If a future hypercall requires >>> + * more space, change this value to "2" and everything will work. >>> + */ >>> +#define HV_HVCALL_ARG_PAGES 1 >>> + >>> +/* >>> + * Allocate space for hypercall input and output arguments from the >>> + * pre-allocated per-cpu hyperv_pcpu_args page(s). A NULL value for the input >>> + * or output indicates to allocate no space for that argument. For input and >>> + * for output, specify the size of the fixed portion, and the size of an >>> + * element in a variable size array. A zero value for entry_size indicates >>> + * there is no array. The fixed size space for the input is zero'ed. >>> + * >> It might be worth explicitly mentioning that interrupts should be disabled when >> calling this function. > > Agreed. > >> >>> + * When variable size arrays are present, the function returns the number of >>> + * elements (i.e, the batch size) that fit in the available space. >>> + * >>> + * The four "size" arguments are expected to be constants, in which case the >>> + * compiler does most of the calculations. Then the generated inline code is no >>> + * larger than if open coding the access to the hyperv_pcpu_arg and doing >>> + * memset() on the input. >>> + */ >>> +static inline int hv_hvcall_inout_array( >>> + void *input, u32 in_size, u32 in_entry_size, >>> + void *output, u32 out_size, u32 out_entry_size) >> Is there a reason input and output are void * instead of void ** here? > > Yes -- it must be void *, and not void **. Consider a function like get_vtl() > in Patch 3 of this series where local variable "input" is declared as: > > struct hv_input_get_vp_registers *input; > > Then the first argument to hv_hvcall_inout() will be of type > struct hv_input_get_vp_registers **. The compiler does not consider > this to match void ** and would generate an error. But > struct hv_input_get_vp_registers ** _does_ match void * and compiles > with no error. It makes sense when you think about it, though it > isn't obvious. I tried to use void ** initially and had to figure out > why the code wouldn't compile. :-) > Ah, that does make sense. Okay then, fair enough! >> >>> +{ >>> + u32 in_batch_count = 0, out_batch_count = 0, batch_count; >>> + u32 in_total_size, out_total_size, offset; >>> + u32 batch_space = HV_HYP_PAGE_SIZE * HV_HVCALL_ARG_PAGES; >>> + void *space; >>> + >>> + /* >>> + * If input and output have arrays, allocate half the space to input >>> + * and half to output. If only input has an array, the array can use >>> + * all the space except for the fixed size output (but not to exceed >>> + * one page), and vice versa. >>> + */ >>> + if (in_entry_size && out_entry_size) >>> + batch_space = batch_space / 2; >>> + else if (in_entry_size) >>> + batch_space = min(HV_HYP_PAGE_SIZE, batch_space - out_size); >>> + else if (out_entry_size) >>> + batch_space = min(HV_HYP_PAGE_SIZE, batch_space - in_size); >>> + >>> + if (in_entry_size) >>> + in_batch_count = (batch_space - in_size) / in_entry_size; >>> + if (out_entry_size) >>> + out_batch_count = (batch_space - out_size) / out_entry_size; >>> + >>> + /* >>> + * If input and output have arrays, use the smaller of the two batch >>> + * counts, in case they are different. If only one has an array, use >>> + * that batch count. batch_count will be zero if neither has an array. >>> + */ >>> + if (in_batch_count && out_batch_count) >>> + batch_count = min(in_batch_count, out_batch_count); >>> + else >>> + batch_count = in_batch_count | out_batch_count; >>> + >>> + in_total_size = ALIGN(in_size + (in_entry_size * batch_count), 8); >>> + out_total_size = ALIGN(out_size + (out_entry_size * batch_count), 8); >>> + >>> + space = *this_cpu_ptr(hyperv_pcpu_arg); >>> + if (input) { >>> + *(void **)input = space; >>> + if (space) >>> + /* Zero the fixed size portion, not any array portion */ >>> + memset(space, 0, ALIGN(in_size, 8)); >>> + } >>> + >>> + if (output) { >>> + if (in_total_size + out_total_size <= HV_HYP_PAGE_SIZE) { >>> + offset = in_total_size; >>> + } else { >>> + offset = HV_HYP_PAGE_SIZE; >>> + /* Need more than 1 page, but only 1 was allocated */ >>> + BUILD_BUG_ON(HV_HVCALL_ARG_PAGES == 1); >> Interesting... so the compiler is not compiling this BUILD_BUG_ON in your patchset >> because in_total_size + out_total_size <= HV_HYP_PAGE_SIZE is always known at >> compile-time? > > Correct. And if for some future hypercall in_total_size + out_total_size is > *not* <= HV_HYP_PAGE_SIZE, the BUILD_BUG_ON() will alert the developer > to the problem. Depending on the argument requirements of this future > hypercall, the solution might require changing HV_HVCALL_ARG_PAGES to 2. > >> So will this also fail if any of the args in_size, in_entry_size, out_size, >> out_entry_size are runtime-known? > > Correct. I should change my wording about "The four 'size' arguments are > expected to be constants" to ". . . must be constants". OTOH, if there's a need > to support non-constants for any of these arguments, some additional code > could handle that case. But the overall hv_hvcall_inout_array() function will > end up generating a lot of code to execute at runtime. I looked at the hypercall > call sites in the OHCL kernel tree, and didn't see any need for non-constants, > but I haven't looked yet at the v4 patch series you just posted. Let me know > if you have a case requiring non-constants. > I don't think we ever use non-constants. In fact I can't think of a case where these values aren't the result of a sizeof() (or 0). Overall I think this approach is looking quite nice and I'd be happy to adopt it in the mshv driver code when this is merged. With the comment change mentioned above: Reviewed-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> > Michael
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h index b13b0cda4ac8..0c8a9133cf1a 100644 --- a/include/asm-generic/mshyperv.h +++ b/include/asm-generic/mshyperv.h @@ -135,6 +135,108 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size, return status; } +/* + * Hypercall input and output argument setup + */ + +/* Temporary mapping to be removed at the end of the patch series */ +#define hyperv_pcpu_arg hyperv_pcpu_input_arg + +/* + * Allocate one page that is shared between input and output args, which is + * sufficient for all current hypercalls. If a future hypercall requires + * more space, change this value to "2" and everything will work. + */ +#define HV_HVCALL_ARG_PAGES 1 + +/* + * Allocate space for hypercall input and output arguments from the + * pre-allocated per-cpu hyperv_pcpu_args page(s). A NULL value for the input + * or output indicates to allocate no space for that argument. For input and + * for output, specify the size of the fixed portion, and the size of an + * element in a variable size array. A zero value for entry_size indicates + * there is no array. The fixed size space for the input is zero'ed. + * + * When variable size arrays are present, the function returns the number of + * elements (i.e, the batch size) that fit in the available space. + * + * The four "size" arguments are expected to be constants, in which case the + * compiler does most of the calculations. Then the generated inline code is no + * larger than if open coding the access to the hyperv_pcpu_arg and doing + * memset() on the input. + */ +static inline int hv_hvcall_inout_array( + void *input, u32 in_size, u32 in_entry_size, + void *output, u32 out_size, u32 out_entry_size) +{ + u32 in_batch_count = 0, out_batch_count = 0, batch_count; + u32 in_total_size, out_total_size, offset; + u32 batch_space = HV_HYP_PAGE_SIZE * HV_HVCALL_ARG_PAGES; + void *space; + + /* + * If input and output have arrays, allocate half the space to input + * and half to output. If only input has an array, the array can use + * all the space except for the fixed size output (but not to exceed + * one page), and vice versa. + */ + if (in_entry_size && out_entry_size) + batch_space = batch_space / 2; + else if (in_entry_size) + batch_space = min(HV_HYP_PAGE_SIZE, batch_space - out_size); + else if (out_entry_size) + batch_space = min(HV_HYP_PAGE_SIZE, batch_space - in_size); + + if (in_entry_size) + in_batch_count = (batch_space - in_size) / in_entry_size; + if (out_entry_size) + out_batch_count = (batch_space - out_size) / out_entry_size; + + /* + * If input and output have arrays, use the smaller of the two batch + * counts, in case they are different. If only one has an array, use + * that batch count. batch_count will be zero if neither has an array. + */ + if (in_batch_count && out_batch_count) + batch_count = min(in_batch_count, out_batch_count); + else + batch_count = in_batch_count | out_batch_count; + + in_total_size = ALIGN(in_size + (in_entry_size * batch_count), 8); + out_total_size = ALIGN(out_size + (out_entry_size * batch_count), 8); + + space = *this_cpu_ptr(hyperv_pcpu_arg); + if (input) { + *(void **)input = space; + if (space) + /* Zero the fixed size portion, not any array portion */ + memset(space, 0, ALIGN(in_size, 8)); + } + + if (output) { + if (in_total_size + out_total_size <= HV_HYP_PAGE_SIZE) { + offset = in_total_size; + } else { + offset = HV_HYP_PAGE_SIZE; + /* Need more than 1 page, but only 1 was allocated */ + BUILD_BUG_ON(HV_HVCALL_ARG_PAGES == 1); + } + *(void **)output = space + offset; + } + + return batch_count; +} + +/* Wrappers for some of the simpler cases with only input, or with no arrays */ +#define hv_hvcall_in(input, in_size) \ + hv_hvcall_inout_array(input, in_size, 0, NULL, 0, 0) + +#define hv_hvcall_inout(input, in_size, output, out_size) \ + hv_hvcall_inout_array(input, in_size, 0, output, out_size, 0) + +#define hv_hvcall_in_array(input, in_size, in_entry_size) \ + hv_hvcall_inout_array(input, in_size, in_entry_size, NULL, 0, 0) + /* Generate the guest OS identifier as described in the Hyper-V TLFS */ static inline u64 hv_generate_guest_id(u64 kernel_version) {