diff mbox series

[PATCH-for-4.17] xen: fix generated code for calling hypercall handlers

Message ID 20221103163631.13145-1-jgross@suse.com (mailing list archive)
State New, archived
Headers show
Series [PATCH-for-4.17] xen: fix generated code for calling hypercall handlers | expand

Commit Message

Jürgen Groß Nov. 3, 2022, 4:36 p.m. UTC
The code generated for the call_handlers_*() macros needs to avoid
undefined behavior when multiple handlers share the same priority.
The issue is the hypercall number being unverified fed into the macros
and then used to set a mask via "mask = 1ULL << <hypercall-number>".

Avoid a shift amount of more than 63 by setting mask to zero in case
the hypercall number is too large.

Fixes: eca1f00d0227 ("xen: generate hypercall interface related code")
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/scripts/gen_hypercall.awk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jan Beulich Nov. 3, 2022, 4:45 p.m. UTC | #1
On 03.11.2022 17:36, Juergen Gross wrote:
> The code generated for the call_handlers_*() macros needs to avoid
> undefined behavior when multiple handlers share the same priority.
> The issue is the hypercall number being unverified fed into the macros
> and then used to set a mask via "mask = 1ULL << <hypercall-number>".
> 
> Avoid a shift amount of more than 63 by setting mask to zero in case
> the hypercall number is too large.
> 
> Fixes: eca1f00d0227 ("xen: generate hypercall interface related code")
> Signed-off-by: Juergen Gross <jgross@suse.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
albeit preferably with ...

> --- a/xen/scripts/gen_hypercall.awk
> +++ b/xen/scripts/gen_hypercall.awk
> @@ -263,7 +263,7 @@ END {
>          printf("#define call_handlers_%s(num, ret, a1, a2, a3, a4, a5) \\\n", ca);
>          printf("({ \\\n");
>          if (need_mask)
> -            printf("    uint64_t mask = 1ULL << num; \\\n");
> +            printf("    uint64_t mask = (num > 63) ? 0 : 1ULL << num; \\\n");

... "num" also properly parenthesized (this is part of a macro definition
in the output after all). Easy enough to take care of while committing.

Jan
Henry Wang Nov. 3, 2022, 4:50 p.m. UTC | #2
Hi Juergen and Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Subject: Re: [PATCH-for-4.17] xen: fix generated code for calling hypercall
> handlers
> 
> On 03.11.2022 17:36, Juergen Gross wrote:
> > The code generated for the call_handlers_*() macros needs to avoid
> > undefined behavior when multiple handlers share the same priority.
> > The issue is the hypercall number being unverified fed into the macros
> > and then used to set a mask via "mask = 1ULL << <hypercall-number>".
> >
> > Avoid a shift amount of more than 63 by setting mask to zero in case
> > the hypercall number is too large.
> >
> > Fixes: eca1f00d0227 ("xen: generate hypercall interface related code")
> > Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Release-acked-by: Henry Wang <Henry.Wang@arm.com>

With...

> albeit preferably with ...
> 
> > --- a/xen/scripts/gen_hypercall.awk
> > +++ b/xen/scripts/gen_hypercall.awk
> > @@ -263,7 +263,7 @@ END {
> >          printf("#define call_handlers_%s(num, ret, a1, a2, a3, a4, a5) \\\n",
> ca);
> >          printf("({ \\\n");
> >          if (need_mask)
> > -            printf("    uint64_t mask = 1ULL << num; \\\n");
> > +            printf("    uint64_t mask = (num > 63) ? 0 : 1ULL << num; \\\n");
> 
> ... "num" also properly parenthesized (this is part of a macro definition
> in the output after all). Easy enough to take care of while committing.

...Jan's comment fixed (or agreement to let the committer fix on commit).

Kind regards,
Henry

> 
> Jan
Andrew Cooper Nov. 4, 2022, 5:01 a.m. UTC | #3
On 03/11/2022 16:36, Juergen Gross wrote:
> The code generated for the call_handlers_*() macros needs to avoid
> undefined behavior when multiple handlers share the same priority.
> The issue is the hypercall number being unverified fed into the macros
> and then used to set a mask via "mask = 1ULL << <hypercall-number>".
>
> Avoid a shift amount of more than 63 by setting mask to zero in case
> the hypercall number is too large.
>
> Fixes: eca1f00d0227 ("xen: generate hypercall interface related code")
> Signed-off-by: Juergen Gross <jgross@suse.com>

This is not a suitable fix.  There being a security issue is just the
tip of the iceberg. 

The changes broke the kexec_op() ABI and this is a blocking regression
vs 4.16.

In lieu of having time to do
https://gitlab.com/xen-project/xen/-/issues/93, here's the abridged list
of errors

The series claims "This is beneficial to performance and avoids
speculation issues.", c/s 8523851dbc4.

That half sentence is literally the sum total of justification given for
this being related to speculation.

The other half of the sentence claims performance.  But no performance
testing was done; the cover letter talks about one test with specifics,
but it describes a scenario where the delta was a handful of cycles
difference, as one part in multi-millions, probably billions.  There is
no plausible way that whatever raw data lead to the "<1% improvement"
claim was statistically significant.

The reason a performance improvement cannot be measured is that a big
out-of-order core can easily absorb the hit in the shadow of other
operations.   Smaller cores cannot, and I'm confident that adequate
performance testing would have demonstrated this.

Unaddressed is the code bloat from the change; relevant because it is
the negative half of the tradeoff on what is allegedly a net improvement
on a fastpath.  Actually trying to reason about the code bloat would
have highlighted why it's rather important that the logic be implemented
as a real function rather than a macro.

Also unaddressed is whether the multi-nesting even has any utility, and
if it does, what it does to the other kinds of workloads.

Unaddressed too is the impact from XSAs 398 and 407 which, as members of
the security team, you had substantially more exposure to than most.


Taking a step back from low level issues.

This series introduces a NIH domain-specific language for describing
hypercalls, but lacking in any documentation.  As an exercise to others,
time how long it takes you to get compile a hypervisor with a new
hypercall that takes e.g. one integer and one pointer parameter.  There
should be a whole lot more acks on that patch for it to be considered to
have an adequate review.

Somewhere (I can't recall where, but it's 4 in the morning so I'm not
looking for it now), a statement was made that if issues were found they
could be addressed going forwards.  But the series was committed without
any possibility for anyone to perform the testing requested of the
original submission.

There was one redeeming property of the series, and yet there was no
discussion anywhere about function pointer casts.  But given that the
premise was disputed to begin with, and the performance testing that
stood an outside chance of countering the dispute was ignored, and
/then/ that my objections were disregarded and the series committed
without calling a vote, I have to say that I'm very displeased with how
this went.

~Andrew
Jürgen Groß Nov. 4, 2022, 5:26 a.m. UTC | #4
On 04.11.22 06:01, Andrew Cooper wrote:
> On 03/11/2022 16:36, Juergen Gross wrote:
>> The code generated for the call_handlers_*() macros needs to avoid
>> undefined behavior when multiple handlers share the same priority.
>> The issue is the hypercall number being unverified fed into the macros
>> and then used to set a mask via "mask = 1ULL << <hypercall-number>".
>>
>> Avoid a shift amount of more than 63 by setting mask to zero in case
>> the hypercall number is too large.
>>
>> Fixes: eca1f00d0227 ("xen: generate hypercall interface related code")
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> This is not a suitable fix.  There being a security issue is just the
> tip of the iceberg.
> 
> The changes broke the kexec_op() ABI and this is a blocking regression
> vs 4.16.

I t would really be beneficial if you would just tell what the issues
are instead of voicing some vague concerns and then dropping off to
silence again when asked (partially multiple times) what the real
problems are.

> In lieu of having time to do
> https://gitlab.com/xen-project/xen/-/issues/93, here's the abridged list
> of errors
> 
> The series claims "This is beneficial to performance and avoids
> speculation issues.", c/s 8523851dbc4.
> 
> That half sentence is literally the sum total of justification given for
> this being related to speculation.
> 
> The other half of the sentence claims performance.  But no performance
> testing was done; the cover letter talks about one test with specifics,
> but it describes a scenario where the delta was a handful of cycles
> difference, as one part in multi-millions, probably billions.  There is
> no plausible way that whatever raw data lead to the "<1% improvement"
> claim was statistically significant.

Yes, and you told me to do some more performance testing with XenServer
and you even didn't respond to queries regarding the state of that
testing.

> The reason a performance improvement cannot be measured is that a big
> out-of-order core can easily absorb the hit in the shadow of other
> operations.   Smaller cores cannot, and I'm confident that adequate
> performance testing would have demonstrated this.
> 
> Unaddressed is the code bloat from the change; relevant because it is
> the negative half of the tradeoff on what is allegedly a net improvement
> on a fastpath.  Actually trying to reason about the code bloat would
> have highlighted why it's rather important that the logic be implemented
> as a real function rather than a macro.

You had several weeks to bring up that concern, yet you didn't.

> Also unaddressed is whether the multi-nesting even has any utility, and
> if it does, what it does to the other kinds of workloads.
> 
> Unaddressed too is the impact from XSAs 398 and 407 which, as members of
> the security team, you had substantially more exposure to than most.
> 
> 
> Taking a step back from low level issues.
> 
> This series introduces a NIH domain-specific language for describing
> hypercalls, but lacking in any documentation.  As an exercise to others,
> time how long it takes you to get compile a hypervisor with a new
> hypercall that takes e.g. one integer and one pointer parameter.  There
> should be a whole lot more acks on that patch for it to be considered to
> have an adequate review.
> 
> Somewhere (I can't recall where, but it's 4 in the morning so I'm not
> looking for it now), a statement was made that if issues were found they
> could be addressed going forwards.  But the series was committed without
> any possibility for anyone to perform the testing requested of the
> original submission.

Funny statement.

The series was pending for being committed for several months, I did ping
multiple times for any feedback (especially you) and you didn't even
respond with a "I'll come back to it later". You just behaved like
/dev/null. That was discussed even in the community call, where the
decision was taken to finally apply the series with you not even reacting
in a minimal way.

> There was one redeeming property of the series, and yet there was no
> discussion anywhere about function pointer casts.  But given that the
> premise was disputed to begin with, and the performance testing that
> stood an outside chance of countering the dispute was ignored, and
> /then/ that my objections were disregarded and the series committed
> without calling a vote, I have to say that I'm very displeased with how
> this went.

Yes, me too.

Being asked for specific concerns multiple times, not reacting, and then
coming back after months that you have been ignored is just disgusting.


Juergen
Jan Beulich Nov. 4, 2022, 7:36 a.m. UTC | #5
On 04.11.2022 06:01, Andrew Cooper wrote:
> On 03/11/2022 16:36, Juergen Gross wrote:
>> The code generated for the call_handlers_*() macros needs to avoid
>> undefined behavior when multiple handlers share the same priority.
>> The issue is the hypercall number being unverified fed into the macros
>> and then used to set a mask via "mask = 1ULL << <hypercall-number>".
>>
>> Avoid a shift amount of more than 63 by setting mask to zero in case
>> the hypercall number is too large.
>>
>> Fixes: eca1f00d0227 ("xen: generate hypercall interface related code")
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> This is not a suitable fix.  There being a security issue is just the
> tip of the iceberg. 

I'm going to commit that change nevertheless. _If_ we decide to revert,
reverting this change as well is going to be easy enough.

Apart of this I can only fully support Jürgen's earlier reply, adding
that iirc it was in part you supporting and/or motivating the change
originally. It is perfectly fine for you to have changed your mind,
but it doesn't help at all if you indicate you did but don't say why.
Even for the change done here you left everyone guessing where the
problem would be that you did hint at in, afaict, merely a private
discussion with George originally. I'm glad it was enough of a hint
for Jürgen to spot the issue. But then ...

> The changes broke the kexec_op() ABI and this is a blocking regression
> vs 4.16.

... you repeat the same pattern here.

> There was one redeeming property of the series, and yet there was no
> discussion anywhere about function pointer casts.  But given that the
> premise was disputed to begin with, and the performance testing that
> stood an outside chance of countering the dispute was ignored, and
> /then/ that my objections were disregarded and the series committed
> without calling a vote, I have to say that I'm very displeased with how
> this went.

For there to be a need to vote, there needs to be an active discussion,
laying out all the issues, such that everyone who would eventually have
to vote can actually form an opinion. We were quite far from that point.

In fact, seeing the repeating pattern of you voicing objections without
then following up, in the "Progressing of pending patch series" design
session in September in Cambridge I did suggest that not followed-up on
objections should expire after a reasonable amount of time. Just giving a
vague "no" (not always quite as brief, but we had extreme cases) should
not be sufficient to block a patch or series indefinitely. This is still
only a suggestion of mine, not the least because it's not really clear to
me how to progress this into something more formal, but there were no
objections to such a model there nor in reply to the notes that were sent
afterwards.

Jan
George Dunlap Nov. 4, 2022, 9:04 p.m. UTC | #6
> On 4 Nov 2022, at 05:01, Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
> 
> The series claims "This is beneficial to performance and avoids
> speculation issues.", c/s 8523851dbc4.
> 
> That half sentence is literally the sum total of justification given for
> this being related to speculation.

The cover letter, written on 15 Oct 2021, mentions “avoid[ing] indirect function calls on the hypercall path”.  Internal security@ discussions from the time show that we were talking about Spectre-BHB (AKA BHI) and its impact on function pointers, specifically those in the hypercall and exception dispatch.  Given that Spectre-BHB wasn’t made public until March 2022, it would have been a violation of the embargo for Jürgen to go into more detail at that time.

It appears that your view on whether hypercall function call tables are a vulnerable surface of attack has changed.  But given that you once believed they needed protecting, it’s not unreasonable for other people to think that they may need protecting; and given that it’s reasonable to think that they may need protecting, you should at least give a *little bit* of a justification for why yo believe they don’t, rather than simply falling back to, “There’s no evidence”.

 -George
George Dunlap Nov. 9, 2022, 8:16 p.m. UTC | #7
> On 4 Nov 2022, at 05:01, Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
> 
> On 03/11/2022 16:36, Juergen Gross wrote:
>> The code generated for the call_handlers_*() macros needs to avoid
>> undefined behavior when multiple handlers share the same priority.
>> The issue is the hypercall number being unverified fed into the macros
>> and then used to set a mask via "mask = 1ULL << <hypercall-number>".
>> 
>> Avoid a shift amount of more than 63 by setting mask to zero in case
>> the hypercall number is too large.
>> 
>> Fixes: eca1f00d0227 ("xen: generate hypercall interface related code")
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> This is not a suitable fix.  There being a security issue is just the
> tip of the iceberg.

At the x86 Maintainer’s meeting on Monday, we (Andrew, Jan, and I) talked about this patch.  Here is my summary of the conversation (with the caveat that I may get some of the details wrong).

The proposed benefits of the series are:

1. By removing indirect calls, it removes those as a “speculative attack surface”.

2. By removing indirect calls, it provides some performance benefit, since indirect calls  require an extra memory fetch.

3. It avoids casting function pointers to function pointers of a different type.  Our current practice is *technically* UB, and is incompatible with some hardware safety mechanisms which we may want to take advantage of at some point in the future; the series addresses both.

There were two incidental technical problems pointed out:

1. A potential shift of more than 64 bytes, which is UB; this has been fixed.

2. The prototype for the kexec_op call was changed from unsigned long to unsigned int; this is an ABI change which will cause differing behavior.  Jan will be looking at how he can fix this, now that it’s been noted.

But the more fundamental costs include:

1. The code is much more difficult now to reason about

2. The code is much larger

3. The long if/else chain could theoretically help hypercalls at the top if the chain, but would definitely begin to hurt hypercalls at the bottom of the chain; and the more hypercalls we add, the more of a theoretical performance penalty this will have

4. By using 64-bit masks, the implementation limits the number of hypercalls to 64; a number we are likely to exceed if we implement ABIv2 to be compatible with AMD SEV.

Additionally, there is a question about whether some of the alleged benefits actually help:

1. On AMD processors, we enable IBRS, which completely removes indirect calls as a speculative attack surface already.  And on Intel processors, this attack surface has already been significantly reduced.  So removing indirect calls is not as important an issue.

2. Normal branches are *also* a surface of speculative attacks; so even apart from the above, all this series does is change one potential attack surface for another one.

3. When we analyze theoretical performance with deep CPU pipelines and speculation in mind, the theoretical disadvantage of indirect branches goes away; and depending on the hardware, there is a theoretical performance degradation.

4. From a practical perspective, the performance tests are very much insufficient to show either that this is an improvement, or that does not cause a performance regression.  To show that there hasn’t been a performance degradation, a battery of tests needs to be done on hardware from a variety of different vendors and cpu generations, since each of them will have different properties after all speculative mitigations have been applied.

So the argument is as follows:

There is no speculative benefit for the series; there is insufficient performance evidence, either to justify a performance benefit or to allay doubts about a performance regression; and the benefit that there is insufficient to counterbalance the costs, and so the series should be reverted.

At the end of the discussion, Jan and I agreed that Andrew had made a good case for the series to be removed at some point.  The discussion needs to be concluded on the list, naturally; and if there is a consensus to remove the series, the next question would be whether we should revert it now, before 4.17.0, or wait until after the release and revert it then (perhaps with a backport to 4.17.1).

(Jan and Andy, please let me know if I’ve misunderstood anything from that meeting.)

I have more details regarding the technical aspects above, but this email is already rather long.  Let me know if you need more details and I’ll try to fill them in.

Thoughts?
 -George
Jürgen Groß Nov. 10, 2022, 6:25 a.m. UTC | #8
On 09.11.22 21:16, George Dunlap wrote:
> 
>> On 4 Nov 2022, at 05:01, Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
>>
>> On 03/11/2022 16:36, Juergen Gross wrote:
>>> The code generated for the call_handlers_*() macros needs to avoid
>>> undefined behavior when multiple handlers share the same priority.
>>> The issue is the hypercall number being unverified fed into the macros
>>> and then used to set a mask via "mask = 1ULL << <hypercall-number>".
>>>
>>> Avoid a shift amount of more than 63 by setting mask to zero in case
>>> the hypercall number is too large.
>>>
>>> Fixes: eca1f00d0227 ("xen: generate hypercall interface related code")
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
>> This is not a suitable fix.  There being a security issue is just the
>> tip of the iceberg.
> 
> At the x86 Maintainer’s meeting on Monday, we (Andrew, Jan, and I) talked about this patch.  Here is my summary of the conversation (with the caveat that I may get some of the details wrong).
> 
> The proposed benefits of the series are:
> 
> 1. By removing indirect calls, it removes those as a “speculative attack surface”.
> 
> 2. By removing indirect calls, it provides some performance benefit, since indirect calls  require an extra memory fetch.
> 
> 3. It avoids casting function pointers to function pointers of a different type.  Our current practice is *technically* UB, and is incompatible with some hardware safety mechanisms which we may want to take advantage of at some point in the future; the series addresses both.
> 
> There were two incidental technical problems pointed out:
> 
> 1. A potential shift of more than 64 bytes, which is UB; this has been fixed.
> 
> 2. The prototype for the kexec_op call was changed from unsigned long to unsigned int; this is an ABI change which will cause differing behavior.  Jan will be looking at how he can fix this, now that it’s been noted.
> 
> But the more fundamental costs include:
> 
> 1. The code is much more difficult now to reason about
> 
> 2. The code is much larger

The to be maintained code is smaller. The overall diffstat of the series shows
that more lines were deleted than added.

The generated code is larger, but this applies to other changes (new compiler,
modified build settings, ...) often enough, too.

> 3. The long if/else chain could theoretically help hypercalls at the top if the chain, but would definitely begin to hurt hypercalls at the bottom of the chain; and the more hypercalls we add, the more of a theoretical performance penalty this will have
> 
> 4. By using 64-bit masks, the implementation limits the number of hypercalls to 64; a number we are likely to exceed if we implement ABIv2 to be compatible with AMD SEV.

This is solvable at one central place.

> Additionally, there is a question about whether some of the alleged benefits actually help:
> 
> 1. On AMD processors, we enable IBRS, which completely removes indirect calls as a speculative attack surface already.  And on Intel processors, this attack surface has already been significantly reduced.  So removing indirect calls is not as important an issue.
> 
> 2. Normal branches are *also* a surface of speculative attacks; so even apart from the above, all this series does is change one potential attack surface for another one.

History has shown that speculative attacks via indirect branches are much
harder to solve. New ones coming up will probably have the same problem.

> 3. When we analyze theoretical performance with deep CPU pipelines and speculation in mind, the theoretical disadvantage of indirect branches goes away; and depending on the hardware, there is a theoretical performance degradation.
> 
> 4. From a practical perspective, the performance tests are very much insufficient to show either that this is an improvement, or that does not cause a performance regression.  To show that there hasn’t been a performance degradation, a battery of tests needs to be done on hardware from a variety of different vendors and cpu generations, since each of them will have different properties after all speculative mitigations have been applied.

This argument is true for many changes we are doing.

The performance impact might be positive or negative. With the possibility
of priorities the impact can be controlled, though.

> So the argument is as follows:
> 
> There is no speculative benefit for the series; there is insufficient performance evidence, either to justify a performance benefit or to allay doubts about a performance regression; and the benefit that there is insufficient to counterbalance the costs, and so the series should be reverted.
> 
> At the end of the discussion, Jan and I agreed that Andrew had made a good case for the series to be removed at some point.  The discussion needs to be concluded on the list, naturally; and if there is a consensus to remove the series, the next question would be whether we should revert it now, before 4.17.0, or wait until after the release and revert it then (perhaps with a backport to 4.17.1).
> 
> (Jan and Andy, please let me know if I’ve misunderstood anything from that meeting.)

I'm not against reverting.

I just wanted to share my thoughts on above reasoning.


Juergen
Jan Beulich Nov. 10, 2022, 8:09 a.m. UTC | #9
On 09.11.2022 21:16, George Dunlap wrote:
>> On 4 Nov 2022, at 05:01, Andrew Cooper <Andrew.Cooper3@citrix.com> wrote:
>> On 03/11/2022 16:36, Juergen Gross wrote:
>>> The code generated for the call_handlers_*() macros needs to avoid
>>> undefined behavior when multiple handlers share the same priority.
>>> The issue is the hypercall number being unverified fed into the macros
>>> and then used to set a mask via "mask = 1ULL << <hypercall-number>".
>>>
>>> Avoid a shift amount of more than 63 by setting mask to zero in case
>>> the hypercall number is too large.
>>>
>>> Fixes: eca1f00d0227 ("xen: generate hypercall interface related code")
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
>> This is not a suitable fix.  There being a security issue is just the
>> tip of the iceberg.
> 
> At the x86 Maintainer’s meeting on Monday, we (Andrew, Jan, and I) talked about this patch.  Here is my summary of the conversation (with the caveat that I may get some of the details wrong).

Just a couple of remarks, mainly to extend context:

> The proposed benefits of the series are:
> 
> 1. By removing indirect calls, it removes those as a “speculative attack surface”.
> 
> 2. By removing indirect calls, it provides some performance benefit, since indirect calls  require an extra memory fetch.
> 
> 3. It avoids casting function pointers to function pointers of a different type.  Our current practice is *technically* UB, and is incompatible with some hardware safety mechanisms which we may want to take advantage of at some point in the future; the series addresses both.
> 
> There were two incidental technical problems pointed out:
> 
> 1. A potential shift of more than 64 bytes, which is UB; this has been fixed.
> 
> 2. The prototype for the kexec_op call was changed from unsigned long to unsigned int; this is an ABI change which will cause differing behavior.  Jan will be looking at how he can fix this, now that it’s been noted.

Patch was already sent and is now fully acked. Will go in later this morning.

> But the more fundamental costs include:
> 
> 1. The code is much more difficult now to reason about
> 
> 2. The code is much larger
> 
> 3. The long if/else chain could theoretically help hypercalls at the top if the chain, but would definitely begin to hurt hypercalls at the bottom of the chain; and the more hypercalls we add, the more of a theoretical performance penalty this will have

After Andrew's remark on how branch history works I've looked at Intel's
ORM, finding that they actually recommend a hybrid approach: Frequently
used numbers dealt with separately, infrequently used ones dealt with by
a common indirect call.

> 4. By using 64-bit masks, the implementation limits the number of hypercalls to 64; a number we are likely to exceed if we implement ABIv2 to be compatible with AMD SEV.

This very much depends on how we encode the new hypercall numbers. In my
proposal a single bit is used, and handlers remain the same. Therefore in
that model there wouldn't really be an extension of hypercall numbers to
cover here.

> Additionally, there is a question about whether some of the alleged benefits actually help:
> 
> 1. On AMD processors, we enable IBRS, which completely removes indirect calls as a speculative attack surface already.  And on Intel processors, this attack surface has already been significantly reduced.  So removing indirect calls is not as important an issue.
> 
> 2. Normal branches are *also* a surface of speculative attacks; so even apart from the above, all this series does is change one potential attack surface for another one.
> 
> 3. When we analyze theoretical performance with deep CPU pipelines and speculation in mind, the theoretical disadvantage of indirect branches goes away; and depending on the hardware, there is a theoretical performance degradation.

I'm inclined to change this to "may go away". As Andrew said on the call, an
important criteria for the performance of indirect calls is how long it takes
to recognize misprediction, and hence how much work needs to be thrown away
and re-done. Which in turn means there's a more significant impact here when
the rate of mis-predictions is higher.

> 4. From a practical perspective, the performance tests are very much insufficient to show either that this is an improvement, or that does not cause a performance regression.  To show that there hasn’t been a performance degradation, a battery of tests needs to be done on hardware from a variety of different vendors and cpu generations, since each of them will have different properties after all speculative mitigations have been applied.
> 
> So the argument is as follows:
> 
> There is no speculative benefit for the series; there is insufficient performance evidence, either to justify a performance benefit or to allay doubts about a performance regression; and the benefit that there is insufficient to counterbalance the costs, and so the series should be reverted.
> 
> At the end of the discussion, Jan and I agreed that Andrew had made a good case for the series to be removed at some point.  The discussion needs to be concluded on the list, naturally; and if there is a consensus to remove the series, the next question would be whether we should revert it now, before 4.17.0, or wait until after the release and revert it then (perhaps with a backport to 4.17.1).

As per above a 3rd option may want considering: Only partially going back to
the original model of using indirect calls (e.g. for all hypercalls which
aren't explicitly assigned a priority).

It's not just this which leaves me thinking that we shouldn't revert now,
but instead take our time to decide what is going to be best long term. If
then we still decide to fully revert, it can still be an option to do the
revert also for 4.17.x (x > 0).

Jan
diff mbox series

Patch

diff --git a/xen/scripts/gen_hypercall.awk b/xen/scripts/gen_hypercall.awk
index 34840c514f..08b2f70bdb 100644
--- a/xen/scripts/gen_hypercall.awk
+++ b/xen/scripts/gen_hypercall.awk
@@ -263,7 +263,7 @@  END {
         printf("#define call_handlers_%s(num, ret, a1, a2, a3, a4, a5) \\\n", ca);
         printf("({ \\\n");
         if (need_mask)
-            printf("    uint64_t mask = 1ULL << num; \\\n");
+            printf("    uint64_t mask = (num > 63) ? 0 : 1ULL << num; \\\n");
         printf("    ");
         for (pl = 1; pl <= n_prios[ca]; pl++) {
             if (prios[ca, p_list[pl]] > 1) {