diff mbox series

x86: guard against straight-line speculation past RET

Message ID deb41469-37b1-f2da-cc76-70720fe85dbe@suse.com (mailing list archive)
State Superseded
Headers show
Series x86: guard against straight-line speculation past RET | expand

Commit Message

Jan Beulich Aug. 24, 2020, 12:50 p.m. UTC
Under certain conditions CPUs can speculate into the instruction stream
past a RET instruction. Guard against this just like 3b7dab93f240
("x86/spec-ctrl: Protect against CALL/JMP straight-line speculation")
did - by inserting an "INT $3" insn. It's merely the mechanics of how to
achieve this that differ: A pair of macros gets introduced to post-
process RET insns issued by the compiler (or living in assembly files).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Should this depend on CONFIG_SPECULATIVE_HARDEN_BRANCH?
---
This depends on the "x86: some assembler macro rework" series posted
over a month ago.

Comments

Andrew Cooper Sept. 4, 2020, 6:18 p.m. UTC | #1
On 24/08/2020 13:50, Jan Beulich wrote:
> --- a/xen/include/asm-x86/asm-defns.h
> +++ b/xen/include/asm-x86/asm-defns.h
> @@ -50,3 +50,19 @@
>  .macro INDIRECT_JMP arg:req
>      INDIRECT_BRANCH jmp \arg
>  .endm
> +
> +/*
> + * To guard against speculation past RET, insert a breakpoint insn
> + * immediately after them.
> + */
> +.macro ret operand:vararg
> +    ret$ \operand
> +.endm
> +.macro ret$ operand:vararg
> +    .purgem ret
> +    ret \operand
> +    int $3
> +    .macro ret operand:vararg
> +        ret$ \\(operand)
> +    .endm
> +.endm

Several observations.  First, clang chokes on this:

<instantiation>:2:9: error: unknown token in expression
    ret \\(operand)
        ^

Second, you mean int3 rather than int $3.  By convention, they are
synonymous, but the second one really ought to be the two byte encoding,
rather than the single byte encoding, and we really want the single byte
version for code volume reasons.

Third, there is a huge quantity of complexity for a form of the
instruction we don't use.  Instead:

.macro ret operand:vararg
    .ifnb \operand
        .error "TODO - speculation safety for 'ret $imm'"
    .endif

    .byte 0xc3
    int3
.endm

is much simpler, and compatible with both GCC and Clang.

Almost...

Clang doesn't actually expand the macro for ret instructions, so a Clang
build of Xen only ends up getting protected in the assembly files.

The following experiment demonstrates the issue:

$ cat ret.c
asm (".macro ret\n\t"
     ".error \"foo\"\n\t"
     ".endm\n\t");
void foo(void) {}

$ gcc -O3 -c ret.c -o ret.o && objdump -d ret.o
/tmp/ccf8hkyN.s: Assembler messages:
/tmp/ccf8hkyN.s:16: Error: foo

$ clang-10 -O3 -c ret.c -o ret.o && objdump -d ret.o

ret.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <foo>:
   0:    c3                       retq


Worse, -no-integrated-as doesn't immediately help, even though it
invokes $(AS).

I tracked that down to the difference between ret and retq, which
highlights an assumption about GCC which may not remain true in the future.

Adding a second macro covering retq fixes the scenario in combination
with -no-integrated-as.

So overall I think we can make a safe binary with a clang build. 
However, it is at the expense of the integrated assembler, which I
believe is now mandatory for LTO, and control-flow integrity, neither of
which we want to lose in the long term.

~Andrew
Jan Beulich Sept. 7, 2020, 9:25 a.m. UTC | #2
On 04.09.2020 20:18, Andrew Cooper wrote:
> On 24/08/2020 13:50, Jan Beulich wrote:
>> --- a/xen/include/asm-x86/asm-defns.h
>> +++ b/xen/include/asm-x86/asm-defns.h
>> @@ -50,3 +50,19 @@
>>  .macro INDIRECT_JMP arg:req
>>      INDIRECT_BRANCH jmp \arg
>>  .endm
>> +
>> +/*
>> + * To guard against speculation past RET, insert a breakpoint insn
>> + * immediately after them.
>> + */
>> +.macro ret operand:vararg
>> +    ret$ \operand
>> +.endm
>> +.macro ret$ operand:vararg
>> +    .purgem ret
>> +    ret \operand
>> +    int $3
>> +    .macro ret operand:vararg
>> +        ret$ \\(operand)
>> +    .endm
>> +.endm
> 
> Several observations.  First, clang chokes on this:
> 
> <instantiation>:2:9: error: unknown token in expression
>     ret \\(operand)
>         ^

Must be clang more recent than the 5.x one I've tested with; likely
because there we end up using -mno-integrated-as.

> Second, you mean int3 rather than int $3.  By convention, they are
> synonymous, but the second one really ought to be the two byte encoding,
> rather than the single byte encoding, and we really want the single byte
> version for code volume reasons.

Well, no, I didn't mean "int3", but I've switched nevertheless, just
for consistency with the earlier change of yours referenced in the
description. To me "int3" has only ever been a kludge. Assemblers
I've grown up with don't know such a mnemonic. Nor did Intel
originally document it, and AMD still doesn't.

> Third, there is a huge quantity of complexity for a form of the
> instruction we don't use.

The complexity isn't with handling the possible immediate operand,
but with the need to override the "ret" insn, and then to transiently
cancel this override.

>  Instead:
> 
> .macro ret operand:vararg
>     .ifnb \operand
>         .error "TODO - speculation safety for 'ret $imm'"
>     .endif
> 
>     .byte 0xc3
>     int3
> .endm
> 
> is much simpler, and compatible with both GCC and Clang.

I really wish to avoid .byte for code emission whenever possible.
It subverts the assembler applying sanity checks. This may not be
overly relevant here, but then we would still better avoid setting
precedents. However, if clang can't be made work without going
this route, so be it.

> Almost...
> 
> Clang doesn't actually expand the macro for ret instructions, so a Clang
> build of Xen only ends up getting protected in the assembly files.
> 
> The following experiment demonstrates the issue:
> 
> $ cat ret.c
> asm (".macro ret\n\t"
>      ".error \"foo\"\n\t"
>      ".endm\n\t");
> void foo(void) {}
> 
> $ gcc -O3 -c ret.c -o ret.o && objdump -d ret.o
> /tmp/ccf8hkyN.s: Assembler messages:
> /tmp/ccf8hkyN.s:16: Error: foo
> 
> $ clang-10 -O3 -c ret.c -o ret.o && objdump -d ret.o
> 
> ret.o:     file format elf64-x86-64
> 
> 
> Disassembly of section .text:
> 
> 0000000000000000 <foo>:
>    0:    c3                       retq
> 
> 
> Worse, -no-integrated-as doesn't immediately help, even though it
> invokes $(AS).
> 
> I tracked that down to the difference between ret and retq, which
> highlights an assumption about GCC which may not remain true in the future.
> 
> Adding a second macro covering retq fixes the scenario in combination
> with -no-integrated-as.

Ah, yes, I should of course have thought of retq. Albeit as per
above - generated code looks fine here when using clang 5.

> So overall I think we can make a safe binary with a clang build. 
> However, it is at the expense of the integrated assembler, which I
> believe is now mandatory for LTO, and control-flow integrity, neither of
> which we want to lose in the long term.

Why at this expense? Are you saying that even when going the .byte
route and even with very new clang one has to force
-mno-integrated-as?

Jan
Jan Beulich Sept. 7, 2020, 10:14 a.m. UTC | #3
On 07.09.2020 11:25, Jan Beulich wrote:
> On 04.09.2020 20:18, Andrew Cooper wrote:
>> Clang doesn't actually expand the macro for ret instructions, so a Clang
>> build of Xen only ends up getting protected in the assembly files.
>>
>> The following experiment demonstrates the issue:
>>
>> $ cat ret.c
>> asm (".macro ret\n\t"
>>      ".error \"foo\"\n\t"
>>      ".endm\n\t");
>> void foo(void) {}
>>
>> $ gcc -O3 -c ret.c -o ret.o && objdump -d ret.o
>> /tmp/ccf8hkyN.s: Assembler messages:
>> /tmp/ccf8hkyN.s:16: Error: foo
>>
>> $ clang-10 -O3 -c ret.c -o ret.o && objdump -d ret.o
>>
>> ret.o:     file format elf64-x86-64
>>
>>
>> Disassembly of section .text:
>>
>> 0000000000000000 <foo>:
>>    0:    c3                       retq
>>
>>
>> Worse, -no-integrated-as doesn't immediately help, even though it
>> invokes $(AS).
>>
>> I tracked that down to the difference between ret and retq, which
>> highlights an assumption about GCC which may not remain true in the future.
>>
>> Adding a second macro covering retq fixes the scenario in combination
>> with -no-integrated-as.
> 
> Ah, yes, I should of course have thought of retq. Albeit as per
> above - generated code looks fine here when using clang 5.

I'm sorry, I can indeed see this part of the issue. I did look at
the wrong build tree when putting together the earlier reply.

Jan
Andrew Cooper Sept. 7, 2020, 1:50 p.m. UTC | #4
On 07/09/2020 10:25, Jan Beulich wrote:
> On 04.09.2020 20:18, Andrew Cooper wrote:
>> On 24/08/2020 13:50, Jan Beulich wrote:
>>> --- a/xen/include/asm-x86/asm-defns.h
>>> +++ b/xen/include/asm-x86/asm-defns.h
>>> @@ -50,3 +50,19 @@
>>>  .macro INDIRECT_JMP arg:req
>>>      INDIRECT_BRANCH jmp \arg
>>>  .endm
>>> +
>>> +/*
>>> + * To guard against speculation past RET, insert a breakpoint insn
>>> + * immediately after them.
>>> + */
>>> +.macro ret operand:vararg
>>> +    ret$ \operand
>>> +.endm
>>> +.macro ret$ operand:vararg
>>> +    .purgem ret
>>> +    ret \operand
>>> +    int $3
>>> +    .macro ret operand:vararg
>>> +        ret$ \\(operand)
>>> +    .endm
>>> +.endm
>> Several observations.  First, clang chokes on this:
>>
>> <instantiation>:2:9: error: unknown token in expression
>>     ret \\(operand)
>>         ^
> Must be clang more recent than the 5.x one I've tested with; likely
> because there we end up using -mno-integrated-as.
>
>> Second, you mean int3 rather than int $3.  By convention, they are
>> synonymous, but the second one really ought to be the two byte encoding,
>> rather than the single byte encoding, and we really want the single byte
>> version for code volume reasons.
> Well, no, I didn't mean "int3", but I've switched nevertheless, just
> for consistency with the earlier change of yours referenced in the
> description. To me "int3" has only ever been a kludge.

Far less of a kludge than having `int $3` magically have a different
encoding to every other `int $n` instructions, which is how assemblers
behave in practice.

> Assemblers
> I've grown up with don't know such a mnemonic. Nor did Intel
> originally document it, and AMD still doesn't.

APM 3, page 413.

>> Third, there is a huge quantity of complexity for a form of the
>> instruction we don't use.
> The complexity isn't with handling the possible immediate operand,
> but with the need to override the "ret" insn, and then to transiently
> cancel this override.

What is the purpose of transiently cancelling the override?

It's not possible to pull this trick twice, so its not as if you're
falling back to a different macro rather than the plain instruction.

>>   Instead:
>>
>> .macro ret operand:vararg
>>     .ifnb \operand
>>         .error "TODO - speculation safety for 'ret $imm'"
>>     .endif
>>
>>     .byte 0xc3
>>     int3
>> .endm
>>
>> is much simpler, and compatible with both GCC and Clang.
> I really wish to avoid .byte for code emission whenever possible.
> It subverts the assembler applying sanity checks. This may not be
> overly relevant here, but then we would still better avoid setting
> precedents. However, if clang can't be made work without going
> this route, so be it.
>
>> Almost...
>>
>> Clang doesn't actually expand the macro for ret instructions, so a Clang
>> build of Xen only ends up getting protected in the assembly files.
>>
>> The following experiment demonstrates the issue:
>>
>> $ cat ret.c
>> asm (".macro ret\n\t"
>>      ".error \"foo\"\n\t"
>>      ".endm\n\t");
>> void foo(void) {}
>>
>> $ gcc -O3 -c ret.c -o ret.o && objdump -d ret.o
>> /tmp/ccf8hkyN.s: Assembler messages:
>> /tmp/ccf8hkyN.s:16: Error: foo
>>
>> $ clang-10 -O3 -c ret.c -o ret.o && objdump -d ret.o
>>
>> ret.o:     file format elf64-x86-64
>>
>>
>> Disassembly of section .text:
>>
>> 0000000000000000 <foo>:
>>    0:    c3                       retq
>>
>>
>> Worse, -no-integrated-as doesn't immediately help, even though it
>> invokes $(AS).
>>
>> I tracked that down to the difference between ret and retq, which
>> highlights an assumption about GCC which may not remain true in the future.
>>
>> Adding a second macro covering retq fixes the scenario in combination
>> with -no-integrated-as.
> Ah, yes, I should of course have thought of retq. Albeit as per
> above - generated code looks fine here when using clang 5.
>
>> So overall I think we can make a safe binary with a clang build. 
>> However, it is at the expense of the integrated assembler, which I
>> believe is now mandatory for LTO, and control-flow integrity, neither of
>> which we want to lose in the long term.
> Why at this expense? Are you saying that even when going the .byte
> route and even with very new clang one has to force
> -mno-integrated-as?

Yes, which was the whole point of providing the full transcript above.

You can't wrap an instruction with a macro with the integrated assembler.

~Andrew
Jan Beulich Sept. 7, 2020, 2:40 p.m. UTC | #5
On 07.09.2020 15:50, Andrew Cooper wrote:
> On 07/09/2020 10:25, Jan Beulich wrote:
>> On 04.09.2020 20:18, Andrew Cooper wrote:
>>> Third, there is a huge quantity of complexity for a form of the
>>> instruction we don't use.
>> The complexity isn't with handling the possible immediate operand,
>> but with the need to override the "ret" insn, and then to transiently
>> cancel this override.
> 
> What is the purpose of transiently cancelling the override?

To have access to the underlying insn again.

> It's not possible to pull this trick twice, so its not as if you're
> falling back to a different macro rather than the plain instruction.

Not sure I understand what you're saying here. The helper macro
purging and then re-instating the insn surrogate macro can be
done any number of times.

>>> Clang doesn't actually expand the macro for ret instructions, so a Clang
>>> build of Xen only ends up getting protected in the assembly files.
>>>
>>> The following experiment demonstrates the issue:
>>>
>>> $ cat ret.c
>>> asm (".macro ret\n\t"
>>>      ".error \"foo\"\n\t"
>>>      ".endm\n\t");
>>> void foo(void) {}
>>>
>>> $ gcc -O3 -c ret.c -o ret.o && objdump -d ret.o
>>> /tmp/ccf8hkyN.s: Assembler messages:
>>> /tmp/ccf8hkyN.s:16: Error: foo
>>>
>>> $ clang-10 -O3 -c ret.c -o ret.o && objdump -d ret.o
>>>
>>> ret.o:     file format elf64-x86-64
>>>
>>>
>>> Disassembly of section .text:
>>>
>>> 0000000000000000 <foo>:
>>>    0:    c3                       retq
>>>
>>>
>>> Worse, -no-integrated-as doesn't immediately help, even though it
>>> invokes $(AS).
>>>
>>> I tracked that down to the difference between ret and retq, which
>>> highlights an assumption about GCC which may not remain true in the future.
>>>
>>> Adding a second macro covering retq fixes the scenario in combination
>>> with -no-integrated-as.
>> Ah, yes, I should of course have thought of retq. Albeit as per
>> above - generated code looks fine here when using clang 5.
>>
>>> So overall I think we can make a safe binary with a clang build. 
>>> However, it is at the expense of the integrated assembler, which I
>>> believe is now mandatory for LTO, and control-flow integrity, neither of
>>> which we want to lose in the long term.
>> Why at this expense? Are you saying that even when going the .byte
>> route and even with very new clang one has to force
>> -mno-integrated-as?
> 
> Yes, which was the whole point of providing the full transcript above.
> 
> You can't wrap an instruction with a macro with the integrated assembler.

Ugly. I can't currently think of a way to address this then, for clang,
other than by them offering a suitable command line option. Question is:
Do we consider it worthwhile then to still address for gcc? Or shall I
drop the patch?

Jan
diff mbox series

Patch

--- a/xen/include/asm-x86/asm-defns.h
+++ b/xen/include/asm-x86/asm-defns.h
@@ -50,3 +50,19 @@ 
 .macro INDIRECT_JMP arg:req
     INDIRECT_BRANCH jmp \arg
 .endm
+
+/*
+ * To guard against speculation past RET, insert a breakpoint insn
+ * immediately after them.
+ */
+.macro ret operand:vararg
+    ret$ \operand
+.endm
+.macro ret$ operand:vararg
+    .purgem ret
+    ret \operand
+    int $3
+    .macro ret operand:vararg
+        ret$ \\(operand)
+    .endm
+.endm