Message ID | 9bea51eb-4fbd-b061-52d7-c6c234d060a1@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86/shadow: misc tidying | expand |
On 11/01/2023 1:57 pm, Jan Beulich wrote: > Make HVM=y release build behavior prone against array overrun, by > (ab)using array_access_nospec(). This is in particular to guard against > e.g. SH_type_unused making it here unintentionally. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > v2: New. > > --- a/xen/arch/x86/mm/shadow/private.h > +++ b/xen/arch/x86/mm/shadow/private.h > @@ -27,6 +27,7 @@ > // been included... > #include <asm/page.h> > #include <xen/domain_page.h> > +#include <xen/nospec.h> > #include <asm/x86_emulate.h> > #include <asm/hvm/support.h> > #include <asm/atomic.h> > @@ -368,7 +369,7 @@ shadow_size(unsigned int shadow_type) > { > #ifdef CONFIG_HVM > ASSERT(shadow_type < ARRAY_SIZE(sh_type_to_size)); > - return sh_type_to_size[shadow_type]; > + return array_access_nospec(sh_type_to_size, shadow_type); I don't think this is warranted. First, if the commit message were accurate, then it's a problem for all arrays of size SH_type_unused, yet you've only adjusted a single instance. Secondly, if it were reliably 16 then we could do the basically-free "type &= 15;" modification. (It appears my change to do this automatically hasn't been taken yet.), but we'll end up with lfence variation here. But the value isn't attacker controlled. shadow_type always comes from Xen's metadata about the guest, not the guest itself. So I don't see how this can conceivably be a speculative issue even in principle. ~Andrew
On 12.01.2023 00:15, Andrew Cooper wrote: > On 11/01/2023 1:57 pm, Jan Beulich wrote: >> Make HVM=y release build behavior prone against array overrun, by >> (ab)using array_access_nospec(). This is in particular to guard against >> e.g. SH_type_unused making it here unintentionally. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> --- >> v2: New. >> >> --- a/xen/arch/x86/mm/shadow/private.h >> +++ b/xen/arch/x86/mm/shadow/private.h >> @@ -27,6 +27,7 @@ >> // been included... >> #include <asm/page.h> >> #include <xen/domain_page.h> >> +#include <xen/nospec.h> >> #include <asm/x86_emulate.h> >> #include <asm/hvm/support.h> >> #include <asm/atomic.h> >> @@ -368,7 +369,7 @@ shadow_size(unsigned int shadow_type) >> { >> #ifdef CONFIG_HVM >> ASSERT(shadow_type < ARRAY_SIZE(sh_type_to_size)); >> - return sh_type_to_size[shadow_type]; >> + return array_access_nospec(sh_type_to_size, shadow_type); > > I don't think this is warranted. > > First, if the commit message were accurate, then it's a problem for all > arrays of size SH_type_unused, yet you've only adjusted a single instance. Because I think the risk is higher here than for other arrays. In other cases we have suitable build-time checks (HASH_CALLBACKS_CHECK() in particular) which would trip upon inappropriate use of one of the types which are aliased to SH_type_unused when !HVM. > Secondly, if it were reliably 16 then we could do the basically-free > "type &= 15;" modification. (It appears my change to do this > automatically hasn't been taken yet.), but we'll end up with lfence > variation here. How could anything be "reliably 16"? Such enums can change at any time: They did when making HVM types conditional, and they will again when adding types needed for 5-level paging. > But the value isn't attacker controlled. shadow_type always comes from > Xen's metadata about the guest, not the guest itself. So I don't see > how this can conceivably be a speculative issue even in principle. I didn't say anything about there being a speculative issue here. It is for this very reason that I wrote "(ab)using". Jan
On 12/01/2023 9:47 am, Jan Beulich wrote: > On 12.01.2023 00:15, Andrew Cooper wrote: >> On 11/01/2023 1:57 pm, Jan Beulich wrote: >>> Make HVM=y release build behavior prone against array overrun, by >>> (ab)using array_access_nospec(). This is in particular to guard against >>> e.g. SH_type_unused making it here unintentionally. >>> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>> --- >>> v2: New. >>> >>> --- a/xen/arch/x86/mm/shadow/private.h >>> +++ b/xen/arch/x86/mm/shadow/private.h >>> @@ -27,6 +27,7 @@ >>> // been included... >>> #include <asm/page.h> >>> #include <xen/domain_page.h> >>> +#include <xen/nospec.h> >>> #include <asm/x86_emulate.h> >>> #include <asm/hvm/support.h> >>> #include <asm/atomic.h> >>> @@ -368,7 +369,7 @@ shadow_size(unsigned int shadow_type) >>> { >>> #ifdef CONFIG_HVM >>> ASSERT(shadow_type < ARRAY_SIZE(sh_type_to_size)); >>> - return sh_type_to_size[shadow_type]; >>> + return array_access_nospec(sh_type_to_size, shadow_type); >> I don't think this is warranted. >> >> First, if the commit message were accurate, then it's a problem for all >> arrays of size SH_type_unused, yet you've only adjusted a single instance. > Because I think the risk is higher here than for other arrays. In > other cases we have suitable build-time checks (HASH_CALLBACKS_CHECK() > in particular) which would trip upon inappropriate use of one of the > types which are aliased to SH_type_unused when !HVM. > >> Secondly, if it were reliably 16 then we could do the basically-free >> "type &= 15;" modification. (It appears my change to do this >> automatically hasn't been taken yet.), but we'll end up with lfence >> variation here. > How could anything be "reliably 16"? Such enums can change at any time: > They did when making HVM types conditional, and they will again when > adding types needed for 5-level paging. > >> But the value isn't attacker controlled. shadow_type always comes from >> Xen's metadata about the guest, not the guest itself. So I don't see >> how this can conceivably be a speculative issue even in principle. > I didn't say anything about there being a speculative issue here. It > is for this very reason that I wrote "(ab)using". Then it is entirely wrong to be using a nospec accessor. Speculation problems are subtle enough, without false uses of the safety helpers. If you want to "harden" against runtime architectural errors, you want to up the ASSERT() to a BUG(), which will execute faster than sticking an hiding an lfence in here, and not hide what is obviously a major malfunction in the shadow pagetable logic. ~Andrew
On 12.01.2023 11:31, Andrew Cooper wrote: > On 12/01/2023 9:47 am, Jan Beulich wrote: >> On 12.01.2023 00:15, Andrew Cooper wrote: >>> On 11/01/2023 1:57 pm, Jan Beulich wrote: >>>> Make HVM=y release build behavior prone against array overrun, by >>>> (ab)using array_access_nospec(). This is in particular to guard against >>>> e.g. SH_type_unused making it here unintentionally. >>>> >>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>> --- >>>> v2: New. >>>> >>>> --- a/xen/arch/x86/mm/shadow/private.h >>>> +++ b/xen/arch/x86/mm/shadow/private.h >>>> @@ -27,6 +27,7 @@ >>>> // been included... >>>> #include <asm/page.h> >>>> #include <xen/domain_page.h> >>>> +#include <xen/nospec.h> >>>> #include <asm/x86_emulate.h> >>>> #include <asm/hvm/support.h> >>>> #include <asm/atomic.h> >>>> @@ -368,7 +369,7 @@ shadow_size(unsigned int shadow_type) >>>> { >>>> #ifdef CONFIG_HVM >>>> ASSERT(shadow_type < ARRAY_SIZE(sh_type_to_size)); >>>> - return sh_type_to_size[shadow_type]; >>>> + return array_access_nospec(sh_type_to_size, shadow_type); >>> I don't think this is warranted. >>> >>> First, if the commit message were accurate, then it's a problem for all >>> arrays of size SH_type_unused, yet you've only adjusted a single instance. >> Because I think the risk is higher here than for other arrays. In >> other cases we have suitable build-time checks (HASH_CALLBACKS_CHECK() >> in particular) which would trip upon inappropriate use of one of the >> types which are aliased to SH_type_unused when !HVM. >> >>> Secondly, if it were reliably 16 then we could do the basically-free >>> "type &= 15;" modification. (It appears my change to do this >>> automatically hasn't been taken yet.), but we'll end up with lfence >>> variation here. >> How could anything be "reliably 16"? Such enums can change at any time: >> They did when making HVM types conditional, and they will again when >> adding types needed for 5-level paging. >> >>> But the value isn't attacker controlled. shadow_type always comes from >>> Xen's metadata about the guest, not the guest itself. So I don't see >>> how this can conceivably be a speculative issue even in principle. >> I didn't say anything about there being a speculative issue here. It >> is for this very reason that I wrote "(ab)using". > > Then it is entirely wrong to be using a nospec accessor. > > Speculation problems are subtle enough, without false uses of the safety > helpers. > > If you want to "harden" against runtime architectural errors, you want > to up the ASSERT() to a BUG(), which will execute faster than sticking > an hiding an lfence in here, and not hide what is obviously a major > malfunction in the shadow pagetable logic. I should have commented on this earlier already: What lfence are you talking about? As to BUG() - the goal here specifically is to avoid a crash in release builds, by forcing the function to return zero (via having it use array slot zero for out of range indexes). Jan
On 12/01/2023 10:42 am, Jan Beulich wrote: > On 12.01.2023 11:31, Andrew Cooper wrote: >> On 12/01/2023 9:47 am, Jan Beulich wrote: >>> On 12.01.2023 00:15, Andrew Cooper wrote: >>>> On 11/01/2023 1:57 pm, Jan Beulich wrote: >>>>> Make HVM=y release build behavior prone against array overrun, by >>>>> (ab)using array_access_nospec(). This is in particular to guard against >>>>> e.g. SH_type_unused making it here unintentionally. >>>>> >>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>>> --- >>>>> v2: New. >>>>> >>>>> --- a/xen/arch/x86/mm/shadow/private.h >>>>> +++ b/xen/arch/x86/mm/shadow/private.h >>>>> @@ -27,6 +27,7 @@ >>>>> // been included... >>>>> #include <asm/page.h> >>>>> #include <xen/domain_page.h> >>>>> +#include <xen/nospec.h> >>>>> #include <asm/x86_emulate.h> >>>>> #include <asm/hvm/support.h> >>>>> #include <asm/atomic.h> >>>>> @@ -368,7 +369,7 @@ shadow_size(unsigned int shadow_type) >>>>> { >>>>> #ifdef CONFIG_HVM >>>>> ASSERT(shadow_type < ARRAY_SIZE(sh_type_to_size)); >>>>> - return sh_type_to_size[shadow_type]; >>>>> + return array_access_nospec(sh_type_to_size, shadow_type); >>>> I don't think this is warranted. >>>> >>>> First, if the commit message were accurate, then it's a problem for all >>>> arrays of size SH_type_unused, yet you've only adjusted a single instance. >>> Because I think the risk is higher here than for other arrays. In >>> other cases we have suitable build-time checks (HASH_CALLBACKS_CHECK() >>> in particular) which would trip upon inappropriate use of one of the >>> types which are aliased to SH_type_unused when !HVM. >>> >>>> Secondly, if it were reliably 16 then we could do the basically-free >>>> "type &= 15;" modification. (It appears my change to do this >>>> automatically hasn't been taken yet.), but we'll end up with lfence >>>> variation here. >>> How could anything be "reliably 16"? Such enums can change at any time: >>> They did when making HVM types conditional, and they will again when >>> adding types needed for 5-level paging. >>> >>>> But the value isn't attacker controlled. shadow_type always comes from >>>> Xen's metadata about the guest, not the guest itself. So I don't see >>>> how this can conceivably be a speculative issue even in principle. >>> I didn't say anything about there being a speculative issue here. It >>> is for this very reason that I wrote "(ab)using". >> Then it is entirely wrong to be using a nospec accessor. >> >> Speculation problems are subtle enough, without false uses of the safety >> helpers. >> >> If you want to "harden" against runtime architectural errors, you want >> to up the ASSERT() to a BUG(), which will execute faster than sticking >> an hiding an lfence in here, and not hide what is obviously a major >> malfunction in the shadow pagetable logic. > I should have commented on this earlier already: What lfence are you > talking about? The one I thought was hiding under array_access_nospec(), but I forgot we'd done the sbb trick. > As to BUG() - the goal here specifically is to avoid a > crash in release builds, by forcing the function to return zero (via > having it use array slot zero for out of range indexes). I'm very uneasy about having something this deep inside a component, which ASSERT()s correctness doing something custom like this "just to avoid crashing". There is either something important which makes this more likely than most to go wrong at runtime, or there is not. And honestly, I can't see why it is more risky at runtime. If we really do need to clamp it, then we need a brand new helper with a name that doesn't reference speculation at all. It's fine for *_nospec to reuse this helper, stating the safety of doing so, but at a code level there need to be two appropriately named helpers for their two logically-unrelated purposes. ~Andrew
On 17.01.2023 20:13, Andrew Cooper wrote: > On 12/01/2023 10:42 am, Jan Beulich wrote: >> On 12.01.2023 11:31, Andrew Cooper wrote: >>> On 12/01/2023 9:47 am, Jan Beulich wrote: >>>> On 12.01.2023 00:15, Andrew Cooper wrote: >>>>> On 11/01/2023 1:57 pm, Jan Beulich wrote: >>>>>> Make HVM=y release build behavior prone against array overrun, by >>>>>> (ab)using array_access_nospec(). This is in particular to guard against >>>>>> e.g. SH_type_unused making it here unintentionally. >>>>>> >>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>>>> --- >>>>>> v2: New. >>>>>> >>>>>> --- a/xen/arch/x86/mm/shadow/private.h >>>>>> +++ b/xen/arch/x86/mm/shadow/private.h >>>>>> @@ -27,6 +27,7 @@ >>>>>> // been included... >>>>>> #include <asm/page.h> >>>>>> #include <xen/domain_page.h> >>>>>> +#include <xen/nospec.h> >>>>>> #include <asm/x86_emulate.h> >>>>>> #include <asm/hvm/support.h> >>>>>> #include <asm/atomic.h> >>>>>> @@ -368,7 +369,7 @@ shadow_size(unsigned int shadow_type) >>>>>> { >>>>>> #ifdef CONFIG_HVM >>>>>> ASSERT(shadow_type < ARRAY_SIZE(sh_type_to_size)); >>>>>> - return sh_type_to_size[shadow_type]; >>>>>> + return array_access_nospec(sh_type_to_size, shadow_type); >>>>> I don't think this is warranted. >>>>> >>>>> First, if the commit message were accurate, then it's a problem for all >>>>> arrays of size SH_type_unused, yet you've only adjusted a single instance. >>>> Because I think the risk is higher here than for other arrays. In >>>> other cases we have suitable build-time checks (HASH_CALLBACKS_CHECK() >>>> in particular) which would trip upon inappropriate use of one of the >>>> types which are aliased to SH_type_unused when !HVM. >>>> >>>>> Secondly, if it were reliably 16 then we could do the basically-free >>>>> "type &= 15;" modification. (It appears my change to do this >>>>> automatically hasn't been taken yet.), but we'll end up with lfence >>>>> variation here. >>>> How could anything be "reliably 16"? Such enums can change at any time: >>>> They did when making HVM types conditional, and they will again when >>>> adding types needed for 5-level paging. >>>> >>>>> But the value isn't attacker controlled. shadow_type always comes from >>>>> Xen's metadata about the guest, not the guest itself. So I don't see >>>>> how this can conceivably be a speculative issue even in principle. >>>> I didn't say anything about there being a speculative issue here. It >>>> is for this very reason that I wrote "(ab)using". >>> Then it is entirely wrong to be using a nospec accessor. >>> >>> Speculation problems are subtle enough, without false uses of the safety >>> helpers. >>> >>> If you want to "harden" against runtime architectural errors, you want >>> to up the ASSERT() to a BUG(), which will execute faster than sticking >>> an hiding an lfence in here, and not hide what is obviously a major >>> malfunction in the shadow pagetable logic. >> I should have commented on this earlier already: What lfence are you >> talking about? > > The one I thought was hiding under array_access_nospec(), but I forgot > we'd done the sbb trick. > >> As to BUG() - the goal here specifically is to avoid a >> crash in release builds, by forcing the function to return zero (via >> having it use array slot zero for out of range indexes). > > I'm very uneasy about having something this deep inside a component, > which ASSERT()s correctness doing something custom like this "just to > avoid crashing". > > There is either something important which makes this more likely than > most to go wrong at runtime, or there is not. And honestly, I can't see > why it is more risky at runtime. Well, okay. I did explain why I think there is an increased risk here. > If we really do need to clamp it, then we need a brand new helper with a > name that doesn't reference speculation at all. It's fine for *_nospec > to reuse this helper, stating the safety of doing so, but at a code > level there need to be two appropriately named helpers for their two > logically-unrelated purposes. I don't think anything can sensibly be made for more general purpose (not speculation related) use. Here I'm specifically utilizing that array slot 0 is being picked as the fallback slot _and_ that slot is actually suitable. This may not be the case for other arrays. Anyway - taking things together I will then simply consider the patch rejected, despite it being a seemingly easy step of hardening. Jan
On 18/01/2023 2:13 pm, Jan Beulich wrote: > On 17.01.2023 20:13, Andrew Cooper wrote: >> On 12/01/2023 10:42 am, Jan Beulich wrote: >>> On 12.01.2023 11:31, Andrew Cooper wrote: >>>> On 12/01/2023 9:47 am, Jan Beulich wrote: >>>>> On 12.01.2023 00:15, Andrew Cooper wrote: >>>>>> On 11/01/2023 1:57 pm, Jan Beulich wrote: >>>>>>> Make HVM=y release build behavior prone against array overrun, by >>>>>>> (ab)using array_access_nospec(). This is in particular to guard against >>>>>>> e.g. SH_type_unused making it here unintentionally. >>>>>>> >>>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>>>>> --- >>>>>>> v2: New. >>>>>>> >>>>>>> --- a/xen/arch/x86/mm/shadow/private.h >>>>>>> +++ b/xen/arch/x86/mm/shadow/private.h >>>>>>> @@ -27,6 +27,7 @@ >>>>>>> // been included... >>>>>>> #include <asm/page.h> >>>>>>> #include <xen/domain_page.h> >>>>>>> +#include <xen/nospec.h> >>>>>>> #include <asm/x86_emulate.h> >>>>>>> #include <asm/hvm/support.h> >>>>>>> #include <asm/atomic.h> >>>>>>> @@ -368,7 +369,7 @@ shadow_size(unsigned int shadow_type) >>>>>>> { >>>>>>> #ifdef CONFIG_HVM >>>>>>> ASSERT(shadow_type < ARRAY_SIZE(sh_type_to_size)); >>>>>>> - return sh_type_to_size[shadow_type]; >>>>>>> + return array_access_nospec(sh_type_to_size, shadow_type); >>>>>> I don't think this is warranted. >>>>>> >>>>>> First, if the commit message were accurate, then it's a problem for all >>>>>> arrays of size SH_type_unused, yet you've only adjusted a single instance. >>>>> Because I think the risk is higher here than for other arrays. In >>>>> other cases we have suitable build-time checks (HASH_CALLBACKS_CHECK() >>>>> in particular) which would trip upon inappropriate use of one of the >>>>> types which are aliased to SH_type_unused when !HVM. >>>>> >>>>>> Secondly, if it were reliably 16 then we could do the basically-free >>>>>> "type &= 15;" modification. (It appears my change to do this >>>>>> automatically hasn't been taken yet.), but we'll end up with lfence >>>>>> variation here. >>>>> How could anything be "reliably 16"? Such enums can change at any time: >>>>> They did when making HVM types conditional, and they will again when >>>>> adding types needed for 5-level paging. >>>>> >>>>>> But the value isn't attacker controlled. shadow_type always comes from >>>>>> Xen's metadata about the guest, not the guest itself. So I don't see >>>>>> how this can conceivably be a speculative issue even in principle. >>>>> I didn't say anything about there being a speculative issue here. It >>>>> is for this very reason that I wrote "(ab)using". >>>> Then it is entirely wrong to be using a nospec accessor. >>>> >>>> Speculation problems are subtle enough, without false uses of the safety >>>> helpers. >>>> >>>> If you want to "harden" against runtime architectural errors, you want >>>> to up the ASSERT() to a BUG(), which will execute faster than sticking >>>> an hiding an lfence in here, and not hide what is obviously a major >>>> malfunction in the shadow pagetable logic. >>> I should have commented on this earlier already: What lfence are you >>> talking about? >> The one I thought was hiding under array_access_nospec(), but I forgot >> we'd done the sbb trick. >> >>> As to BUG() - the goal here specifically is to avoid a >>> crash in release builds, by forcing the function to return zero (via >>> having it use array slot zero for out of range indexes). >> I'm very uneasy about having something this deep inside a component, >> which ASSERT()s correctness doing something custom like this "just to >> avoid crashing". >> >> There is either something important which makes this more likely than >> most to go wrong at runtime, or there is not. And honestly, I can't see >> why it is more risky at runtime. > Well, okay. I did explain why I think there is an increased risk here. > >> If we really do need to clamp it, then we need a brand new helper with a >> name that doesn't reference speculation at all. It's fine for *_nospec >> to reuse this helper, stating the safety of doing so, but at a code >> level there need to be two appropriately named helpers for their two >> logically-unrelated purposes. > I don't think anything can sensibly be made for more general purpose > (not speculation related) use. Here I'm specifically utilizing that > array slot 0 is being picked as the fallback slot _and_ that slot is > actually suitable. This may not be the case for other arrays. > > Anyway - taking things together I will then simply consider the patch > rejected, despite it being a seemingly easy step of hardening. I don't have a problem, in principle, with something like array_clamp(arr, idx) as long as it comes with a API description making it clear that it turns out-of-bounds indices into 0. But use of this in code also needs to come with a comment explaining why the piece of code is risky enough to justify it. (As much as anything else, so we can figure out when to remove it if the preconditions change.) ~Andrew
--- a/xen/arch/x86/mm/shadow/private.h +++ b/xen/arch/x86/mm/shadow/private.h @@ -27,6 +27,7 @@ // been included... #include <asm/page.h> #include <xen/domain_page.h> +#include <xen/nospec.h> #include <asm/x86_emulate.h> #include <asm/hvm/support.h> #include <asm/atomic.h> @@ -368,7 +369,7 @@ shadow_size(unsigned int shadow_type) { #ifdef CONFIG_HVM ASSERT(shadow_type < ARRAY_SIZE(sh_type_to_size)); - return sh_type_to_size[shadow_type]; + return array_access_nospec(sh_type_to_size, shadow_type); #else ASSERT(shadow_type < SH_type_unused); return shadow_type != SH_type_none;
Make HVM=y release build behavior prone against array overrun, by (ab)using array_access_nospec(). This is in particular to guard against e.g. SH_type_unused making it here unintentionally. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- v2: New.