Message ID | 20230613001108.3040476-24-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Shadow stacks for userspace | expand |
On Mon, Jun 12, 2023 at 05:10:49PM -0700, Rick Edgecombe wrote: > +Enabling arch_prctl()'s > +======================= > + > +Elf features should be enabled by the loader using the below arch_prctl's. They > +are only supported in 64 bit user applications. These operate on the features > +on a per-thread basis. The enablement status is inherited on clone, so if the > +feature is enabled on the first thread, it will propagate to all the thread's > +in an app. I appreciate it's very late in the development of this series but given that there are very similar features on both arm64 and riscv would it make sense to make these just regular prctl()s, arch_prctl() isn't used on other architectures and it'd reduce the amount of arch specific work that userspace needs to do if the interface is shared. It should also be possible to support both interfaces for x86 I guess, though that feels like askng for trouble.
* Mark Brown: > On Mon, Jun 12, 2023 at 05:10:49PM -0700, Rick Edgecombe wrote: > >> +Enabling arch_prctl()'s >> +======================= >> + >> +Elf features should be enabled by the loader using the below arch_prctl's. They >> +are only supported in 64 bit user applications. These operate on the features >> +on a per-thread basis. The enablement status is inherited on clone, so if the >> +feature is enabled on the first thread, it will propagate to all the thread's >> +in an app. > > I appreciate it's very late in the development of this series but given > that there are very similar features on both arm64 and riscv would it > make sense to make these just regular prctl()s, arch_prctl() isn't used > on other architectures and it'd reduce the amount of arch specific work > that userspace needs to do if the interface is shared. Has the Arm feature been fully disclosed? I would expect the integration with stack switching and unwinding differs between architectures even if the core mechanism is similar. It's probably tempting to handle shadow stack placement differently, too. Thanks, Florian
On Tue, Jun 13, 2023 at 02:37:18PM +0200, Florian Weimer wrote: > > I appreciate it's very late in the development of this series but given > > that there are very similar features on both arm64 and riscv would it > > make sense to make these just regular prctl()s, arch_prctl() isn't used > > on other architectures and it'd reduce the amount of arch specific work > > that userspace needs to do if the interface is shared. > Has the Arm feature been fully disclosed? Unfortunately no, it's not yet been folded into the ARM. The system registers and instructions are in the latest XML releases but that's not the full story. > I would expect the integration with stack switching and unwinding > differs between architectures even if the core mechanism is similar. > It's probably tempting to handle shadow stack placement differently, > too. Yeah, there's likely to be some differences (though given the amount of discussion on the x86 implementation I'm trying to follow the decisions there as much as reasonable on the basis that we should hopefully come to the same conclusions). It seemed worth mentioning as a needless bump, OTOH I defninitely don't see it as critical.
On Tue, 2023-06-13 at 16:15 +0100, Mark Brown wrote: > > I would expect the integration with stack switching and unwinding > > differs between architectures even if the core mechanism is > > similar. > > It's probably tempting to handle shadow stack placement > > differently, > > too. > > Yeah, there's likely to be some differences (though given the amount > of > discussion on the x86 implementation I'm trying to follow the > decisions > there as much as reasonable on the basis that we should hopefully > come > to the same conclusions). It seemed worth mentioning as a needless > bump, OTOH I defninitely don't see it as critical. Two things that came up as far as unifying the interface were: 1. The map_shadow_stack syscall x86 shadow stack does some optional pre-populating of the shadow stack memory. And in additional not all types of memory are supported (private anonymous only). This is partly to strengthen the security (which might be a cross-arch thing) and also partly due to x86's Write=0,Dirty=1 PTE bit combination. So a new syscall fit better. Some core-mm folks were not super keen on overloading mmap() to start doing things like writing to the memory being mapped, as well. 2. The arch_prctl() interface While enable and disable might be shared, there are some arch-specific stuff for x86 like enabling the WRSS instruction. For x86 all of the exercising of the kernel interface was in arch specific code, so unifying the kernel interface didn't save much on the user side. If there turns out to be some unification opportunities when everything is explored and decided on, we could have the option of tying x86's feature into it later. I think the map_shadow_stack syscall had the most debate. But the arch_prctl() was mostly agreed on IIRC. The debate was mostly with glibc folks and the riscv shadow stack developer. For my part, the thing I would really like to see unified as much as possible is at the app developer's interface (glibc/gcc). The idea would be to make it easy for app developers to know if their app supports shadow stack. There will probably be some differences, but it would be great if there was mostly the same behavior and a small list of differences. I'm thinking about the behavior of longjmp(), swapcontext(), etc.
On Tue, Jun 13, 2023 at 05:11:35PM +0000, Edgecombe, Rick P wrote: > Two things that came up as far as unifying the interface were: > 1. The map_shadow_stack syscall > x86 shadow stack does some optional pre-populating of the shadow stack > memory. And in additional not all types of memory are supported > (private anonymous only). This is partly to strengthen the security > (which might be a cross-arch thing) and also partly due to x86's > Write=0,Dirty=1 PTE bit combination. So a new syscall fit better. Some > core-mm folks were not super keen on overloading mmap() to start doing > things like writing to the memory being mapped, as well. Right, the strengthening security bits made this one look cross arch - that one wasn't worrying me. > 2. The arch_prctl() interface > While enable and disable might be shared, there are some arch-specific > stuff for x86 like enabling the WRSS instruction. > For x86 all of the exercising of the kernel interface was in arch > specific code, so unifying the kernel interface didn't save much on the > user side. If there turns out to be some unification opportunities when > everything is explored and decided on, we could have the option of > tying x86's feature into it later. > I think the map_shadow_stack syscall had the most debate. But the > arch_prctl() was mostly agreed on IIRC. The debate was mostly with > glibc folks and the riscv shadow stack developer. For arm64 we have an equivalentish thing to WRSS which lets us control if userspace can explicitly push or pop values onto the shadow stack (GCS for us) so it all maps on well - before I noticed that it was arch_prctl() I was looking at it and thinking it worked for us. At the minute I've taken the prctl() patch from the riscv series and added in a flag for writability since we just don't have an arch_prctl(), this isn't a huge deal but it just seemed like needless effort to wonder why it's different. > For my part, the thing I would really like to see unified as much as > possible is at the app developer's interface (glibc/gcc). The idea > would be to make it easy for app developers to know if their app > supports shadow stack. There will probably be some differences, but it > would be great if there was mostly the same behavior and a small list > of differences. I'm thinking about the behavior of longjmp(), > swapcontext(), etc. Yes, very much so. sigaltcontext() too.
On Tue, 2023-06-13 at 18:57 +0100, Mark Brown wrote: > > > > For my part, the thing I would really like to see unified as > > > > much > > as > > > > possible is at the app developer's interface (glibc/gcc). The > > > > idea > > > > would be to make it easy for app developers to know if their > > > > app > > > > supports shadow stack. There will probably be some differences, > > > > but > > it > > > > would be great if there was mostly the same behavior and a > > > > small > > list > > > > of differences. I'm thinking about the behavior of longjmp(), > > > > swapcontext(), etc. > > > > Yes, very much so. sigaltcontext() too. For alt shadow stack's, this is what I came up with: https://lore.kernel.org/lkml/20220929222936.14584-40-rick.p.edgecombe@intel.com/ Unfortunately it can't work automatically with sigaltstack(). Since it has to be a new thing anyway, it's been left for the future. I guess that might have a better chance of being cross arch. BTW, last time this series accidentally broke an arm config and made it all the way through the robots up to Linus. Would you mind giving patches 1-3 a check? Thanks, Rick
The 06/13/2023 19:57, Edgecombe, Rick P wrote: > On Tue, 2023-06-13 at 18:57 +0100, Mark Brown wrote: > > > > > For my part, the thing I would really like to see unified as > > > > > much > > as > > > > > possible is at the app developer's interface (glibc/gcc). The > > > > > idea > > > > > would be to make it easy for app developers to know if their > > > > > app > > > > > supports shadow stack. There will probably be some differences, > > > > > but > > it > > > > > would be great if there was mostly the same behavior and a > > > > > small > > list > > > > > of differences. I'm thinking about the behavior of longjmp(), > > > > > swapcontext(), etc. > > > > > > Yes, very much so. sigaltcontext() too. > > For alt shadow stack's, this is what I came up with: > https://lore.kernel.org/lkml/20220929222936.14584-40-rick.p.edgecombe@intel.com/ > > Unfortunately it can't work automatically with sigaltstack(). Since it > has to be a new thing anyway, it's been left for the future. I guess > that might have a better chance of being cross arch. i dont think you can add sigaltshstk later. libgcc already has unwinder code for shstk and that cannot handle discontinous shadow stack. (may affect longjmp too depending on how it is implemented) we can change the unwinder now to know how to switch shstk when it unwinds the signal frame and backport that to systems that want to support shstk. or we can introduce a new elf marking scheme just for sigaltshstk when it is added so incompatibility can be detected. or we simply not support unwinding with sigaltshstk which would make it pretty much useless in practice. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Tue, Jun 13, 2023 at 07:57:37PM +0000, Edgecombe, Rick P wrote: > For alt shadow stack's, this is what I came up with: > https://lore.kernel.org/lkml/20220929222936.14584-40-rick.p.edgecombe@intel.com/ > Unfortunately it can't work automatically with sigaltstack(). Since it > has to be a new thing anyway, it's been left for the future. I guess > that might have a better chance of being cross arch. Yeah, I've not seen and can't think of anything that's entirely satisfactory either. Like Szabolcs says I do think we need a story on this. > BTW, last time this series accidentally broke an arm config and made it > all the way through the robots up to Linus. Would you mind giving > patches 1-3 a check? I'm in the middle of importing the whole series into my development branch, but note that I'm only really working with arm64 not arm so might miss stuff the bots would hit. Hopefully there should be some Tested-bys coming for arm64 anyway.
On Wed, 2023-06-14 at 11:43 +0100, szabolcs.nagy@arm.com wrote: > The 06/13/2023 19:57, Edgecombe, Rick P wrote: > > On Tue, 2023-06-13 at 18:57 +0100, Mark Brown wrote: > > > > > > For my part, the thing I would really like to see unified > > > > > > as > > > > > > much > > as > > > > > > possible is at the app developer's interface (glibc/gcc). > > > > > > The > > > > > > idea > > > > > > would be to make it easy for app developers to know if > > > > > > their > > > > > > app > > > > > > supports shadow stack. There will probably be some > > > > > > differences, > > > > > > but > > it > > > > > > would be great if there was mostly the same behavior and a > > > > > > small > > list > > > > > > of differences. I'm thinking about the behavior of > > > > > > longjmp(), > > > > > > swapcontext(), etc. > > > > > > > > Yes, very much so. sigaltcontext() too. > > > > For alt shadow stack's, this is what I came up with: > > https://lore.kernel.org/lkml/20220929222936.14584-40-rick.p.edgecombe@intel.com/ > > > > Unfortunately it can't work automatically with sigaltstack(). Since > > it > > has to be a new thing anyway, it's been left for the future. I > > guess > > that might have a better chance of being cross arch. > > i dont think you can add sigaltshstk later. > > libgcc already has unwinder code for shstk and that cannot handle > discontinous shadow stack. Are you referring to the existing C++ exception unwinding code that expects a different signal frame format? Yea this is a problem, but I don't see how it's a problem with any solutions now that will be harder later. I mentioned it when I brought up all the app compatibility problems.[0] The problem is that gcc expects a fixed 8 byte sized shadow stack signal frame. The format in these patches is such that it can be expanded for the sake of supporting alt shadow stack later, but it happens to be a fixed 8 bytes for now, so it will work seamlessly with these old gcc's. HJ has some patches to fix GCC to jump over a dynamically sized shadow stack signal frame, but this of course won't stop old gcc's from generating binaries that won't work with an expanded frame. I was waffling on whether it would be better to pad the shadow stack [1] signal frame to start, this would break compatibility with any binaries that use this -fnon-call-exceptions feature (if there are any), but would set us up better for the future if we got away with it. On one hand we are already juggling some compatibility issues so maybe it's not too much worse, but on the other hand the kernel is trying its best to be as compatible as it can given the situation. It doesn't *need* to break this compatibility at this point. In the end I thought it was better to deal with it later. > (may affect longjmp too depending on > how it is implemented) glibc's longjmp ignores anything everything it skips over and just does INCSSP until it gets back to the setjmp point. So it is not affected by the shadow stack signal frame format. I don't think we can support longjmping off an alt shadow stack unless we enable WRSS or get the kernel's help. So this was to be declared as unsupported. > > we can change the unwinder now to know how to switch shstk when > it unwinds the signal frame and backport that to systems that > want to support shstk. or we can introduce a new elf marking > scheme just for sigaltshstk when it is added so incompatibility > can be detected. or we simply not support unwinding with > sigaltshstk which would make it pretty much useless in practice. Yea, I was thinking along the same lines. Someday we could easily need some new marker. Maybe because we want to add something, or maybe because of the pre-existing userspace. In that case, this implementation will get the ball rolling and we can learn more about how shadow stack will be used. So if we need to break compatibility with any apps, we would not really be in a different situation than we are already in (if we are going to take proper care to not break userspace). So if/when that happens all the learning's can go into the clean break. But if it's not clear, unwinder's that properly use the format in these patches should work from an alt shadow stack implemented like that RFC linked earlier in the thread. At least it will be able to read back the shadow stack starting from the alt shadow stack, it can't actually resume control flow from where it unwound to. For that we need WRSS or some kernel help. [0] https://lore.kernel.org/lkml/7d8133c7e0186bdaeb3893c1c808148dc0d11945.camel@intel.com/
The 06/14/2023 16:57, Edgecombe, Rick P wrote: > On Wed, 2023-06-14 at 11:43 +0100, szabolcs.nagy@arm.com wrote: > > i dont think you can add sigaltshstk later. > > > > libgcc already has unwinder code for shstk and that cannot handle > > discontinous shadow stack. > > Are you referring to the existing C++ exception unwinding code that > expects a different signal frame format? Yea this is a problem, but I > don't see how it's a problem with any solutions now that will be harder > later. I mentioned it when I brought up all the app compatibility > problems.[0] there is old unwinder code incompatible with the current patches, but that was fixed. however the new unwind code assumes signal entry pushes one extra token that just have to be popped from the shstk. this abi cannot be expanded which means 1) kernel cannot push more tokens for more integrity checks (or to add whatever other features) 2) sigaltshstk cannot work. if the unwinder instead interprets the token to be the old ssp and either incssp or switch to that ssp (depending on continous or discontinous shstk, which the unwinder can detect), then 1) and 2) are fixed. but currently the distributed unwinder binary is incompatible with 1) and 2) so sigaltshstk cannot be added later. breaking the unwind abi is not acceptable. > The problem is that gcc expects a fixed 8 byte sized shadow stack > signal frame. The format in these patches is such that it can be > expanded for the sake of supporting alt shadow stack later, but it > happens to be a fixed 8 bytes for now, so it will work seamlessly with > these old gcc's. HJ has some patches to fix GCC to jump over a > dynamically sized shadow stack signal frame, but this of course won't > stop old gcc's from generating binaries that won't work with an > expanded frame. > > I was waffling on whether it would be better to pad the shadow stack > [1] signal frame to start, this would break compatibility with any > binaries that use this -fnon-call-exceptions feature (if there are > any), but would set us up better for the future if we got away with it. i don't see how -fnon-call-exceptions is relevant. you can unwind from a signal handler (this is not a c++ question but unwind abi question) and in practice eh works e.g. if the signal is raised (sync or async) in a frame where there are no cleanup handlers registered. in practice code rarely relies on this (because it's not valid in c++). the main user of this i know of is the glibc cancellation implmentation. (that is special in that it never catches the exception so ssp does not have to be updated for things to work, but in principle the unwinder should still verify the entries on shstk, otherwise the security guarantees are broken and the cleanup handlers can be hijacked. there are glibc abi issues that prevent fixing this, but in other libcs this may be still relevant). > On one hand we are already juggling some compatibility issues so maybe > it's not too much worse, but on the other hand the kernel is trying its > best to be as compatible as it can given the situation. It doesn't > *need* to break this compatibility at this point. > > In the end I thought it was better to deal with it later. > > > (may affect longjmp too depending on > > how it is implemented) > > glibc's longjmp ignores anything everything it skips over and just does > INCSSP until it gets back to the setjmp point. So it is not affected by > the shadow stack signal frame format. I don't think we can support > longjmping off an alt shadow stack unless we enable WRSS or get the > kernel's help. So this was to be declared as unsupported. longjmp can support discontinous shadow stack without wrss. the current code proposed to glibc does not, which is wrong (it breaks altshstk and green thread users like qemu for no good reason). declaring things unsupported means you have to go around to audit and mark binaries accordingly. > > we can change the unwinder now to know how to switch shstk when > > it unwinds the signal frame and backport that to systems that > > want to support shstk. or we can introduce a new elf marking > > scheme just for sigaltshstk when it is added so incompatibility > > can be detected. or we simply not support unwinding with > > sigaltshstk which would make it pretty much useless in practice. > > Yea, I was thinking along the same lines. Someday we could easily need > some new marker. Maybe because we want to add something, or maybe > because of the pre-existing userspace. In that case, this > implementation will get the ball rolling and we can learn more about > how shadow stack will be used. So if we need to break compatibility > with any apps, we would not really be in a different situation than we > are already in (if we are going to take proper care to not break > userspace). So if/when that happens all the learning's can go into the > clean break. > > But if it's not clear, unwinder's that properly use the format in these > patches should work from an alt shadow stack implemented like that RFC > linked earlier in the thread. At least it will be able to read back the > shadow stack starting from the alt shadow stack, it can't actually > resume control flow from where it unwound to. For that we need WRSS or > some kernel help. wrss is not needed to resume control flow on a different shstk. (if you needed wrss then the map_shadow_stack would be useless.) > > [0] > https://lore.kernel.org/lkml/7d8133c7e0186bdaeb3893c1c808148dc0d11945.camel@intel.com/ >
On Mon, 2023-06-19 at 09:47 +0100, szabolcs.nagy@arm.com wrote: > The 06/14/2023 16:57, Edgecombe, Rick P wrote: > > On Wed, 2023-06-14 at 11:43 +0100, szabolcs.nagy@arm.com wrote: > > > i dont think you can add sigaltshstk later. > > > > > > libgcc already has unwinder code for shstk and that cannot handle > > > discontinous shadow stack. > > > > Are you referring to the existing C++ exception unwinding code that > > expects a different signal frame format? Yea this is a problem, but > > I > > don't see how it's a problem with any solutions now that will be > > harder > > later. I mentioned it when I brought up all the app compatibility > > problems.[0] > > there is old unwinder code incompatible with the current patches, > but that was fixed. however the new unwind code assumes signal > entry pushes one extra token that just have to be popped from the > shstk. this abi cannot be expanded which means > > 1) kernel cannot push more tokens for more integrity checks > (or to add whatever other features) > > 2) sigaltshstk cannot work. > > if the unwinder instead interprets the token to be the old ssp and > either incssp or switch to that ssp (depending on continous or > discontinous shstk, which the unwinder can detect), then 1) and 2) > are fixed. > > but currently the distributed unwinder binary is incompatible with > 1) and 2) so sigaltshstk cannot be added later. breaking the unwind > abi is not acceptable. Can you point me to what you are talking about? I tested adding fields to the shadow stack on top of these changes. It worked with HJ's new (unposted I think) C++ changes for gcc. Adding fields doesn't work with the existing gcc because it assumes a fixed size. > > > The problem is that gcc expects a fixed 8 byte sized shadow stack > > signal frame. The format in these patches is such that it can be > > expanded for the sake of supporting alt shadow stack later, but it > > happens to be a fixed 8 bytes for now, so it will work seamlessly > > with > > these old gcc's. HJ has some patches to fix GCC to jump over a > > dynamically sized shadow stack signal frame, but this of course > > won't > > stop old gcc's from generating binaries that won't work with an > > expanded frame. > > > > I was waffling on whether it would be better to pad the shadow > > stack > > [1] signal frame to start, this would break compatibility with any > > binaries that use this -fnon-call-exceptions feature (if there are > > any), but would set us up better for the future if we got away with > > it. > > i don't see how -fnon-call-exceptions is relevant. It uses unwinder code that does assume a fixed shadow stack signal frame size. Since gcc 8.5 I think. So these compilers will continue to generate code that assumes a fixed frame size. This is one of the limitations we have for not moving to a new elf bit. > > you can unwind from a signal handler (this is not a c++ question > but unwind abi question) and in practice eh works e.g. if the > signal is raised (sync or async) in a frame where there are no > cleanup handlers registered. in practice code rarely relies on > this (because it's not valid in c++). the main user of this i > know of is the glibc cancellation implmentation. (that is special > in that it never catches the exception so ssp does not have to be > updated for things to work, but in principle the unwinder should > still verify the entries on shstk, otherwise the security > guarantees are broken and the cleanup handlers can be hijacked. > there are glibc abi issues that prevent fixing this, but in other > libcs this may be still relevant). I'm not fully sure what you are trying to say here. The glibc shadow stack stuff that is there today supports unwinding through a signal handler. The longjmp code (unlike fnon-call-exections) doesn't look at the shstk signal frame. It just does INCSSP until it reaches its desired SSP, not caring what it is INCSSPing over. > > > On one hand we are already juggling some compatibility issues so > > maybe > > it's not too much worse, but on the other hand the kernel is trying > > its > > best to be as compatible as it can given the situation. It doesn't > > *need* to break this compatibility at this point. > > > > In the end I thought it was better to deal with it later. > > > > > (may affect longjmp too depending on > > > how it is implemented) > > > > glibc's longjmp ignores anything everything it skips over and just > > does > > INCSSP until it gets back to the setjmp point. So it is not > > affected by > > the shadow stack signal frame format. I don't think we can support > > longjmping off an alt shadow stack unless we enable WRSS or get the > > kernel's help. So this was to be declared as unsupported. > > longjmp can support discontinous shadow stack without wrss. > the current code proposed to glibc does not, which is wrong > (it breaks altshstk and green thread users like qemu for no > good reason). > > declaring things unsupported means you have to go around to > audit and mark binaries accordingly. The idea that all apps can be supported without auditing has been assumed to be impossible by everyone I've talked to, including the GLIBC developers deeply versed in the architectural limitations of this feature. So if you have a magic solution, then that is a notable claim and I think you should propose it instead of just alluding to the fact that there is one. The only non-WRSS "longjmp from an alt shadow stack solution" that I can think of would have something like a new syscall performing some limited shadow stack actions normally prohibited in userspace by the architecture. We'd have to think through how this would impact the security. There are a lot of security/compatibility tradeoffs to parse in this. So also, just because something can be done, doesn't mean we should do it. I think the philosophy at this point is, lets get the basics working that can support most apps, and learn more about stuff like where this bar is in the real world. > > > > we can change the unwinder now to know how to switch shstk when > > > it unwinds the signal frame and backport that to systems that > > > want to support shstk. or we can introduce a new elf marking > > > scheme just for sigaltshstk when it is added so incompatibility > > > can be detected. or we simply not support unwinding with > > > sigaltshstk which would make it pretty much useless in practice. > > > > Yea, I was thinking along the same lines. Someday we could easily > > need > > some new marker. Maybe because we want to add something, or maybe > > because of the pre-existing userspace. In that case, this > > implementation will get the ball rolling and we can learn more > > about > > how shadow stack will be used. So if we need to break compatibility > > with any apps, we would not really be in a different situation than > > we > > are already in (if we are going to take proper care to not break > > userspace). So if/when that happens all the learning's can go into > > the > > clean break. > > > > But if it's not clear, unwinder's that properly use the format in > > these > > patches should work from an alt shadow stack implemented like that > > RFC > > linked earlier in the thread. At least it will be able to read back > > the > > shadow stack starting from the alt shadow stack, it can't actually > > resume control flow from where it unwound to. For that we need WRSS > > or > > some kernel help. > > wrss is not needed to resume control flow on a different shstk. WRSS lets you resume control flow at arbitrarily points by writing your own restore token. Otherwise there are restrictions. > > (if you needed wrss then the map_shadow_stack would be useless.) map_shadow_stack is usually prepopulated with a token, otherwise it does need WRSS to create one on it. > > > > > [0] > > https://lore.kernel.org/lkml/7d8133c7e0186bdaeb3893c1c808148dc0d11945.camel@intel.com/ > >
The 06/19/2023 16:44, Edgecombe, Rick P wrote: > On Mon, 2023-06-19 at 09:47 +0100, szabolcs.nagy@arm.com wrote: > > The 06/14/2023 16:57, Edgecombe, Rick P wrote: > > > On Wed, 2023-06-14 at 11:43 +0100, szabolcs.nagy@arm.com wrote: > > > > i dont think you can add sigaltshstk later. > > > > > > > > libgcc already has unwinder code for shstk and that cannot handle > > > > discontinous shadow stack. > > > > > > Are you referring to the existing C++ exception unwinding code that > > > expects a different signal frame format? Yea this is a problem, but > > > I > > > don't see how it's a problem with any solutions now that will be > > > harder > > > later. I mentioned it when I brought up all the app compatibility > > > problems.[0] > > > > there is old unwinder code incompatible with the current patches, > > but that was fixed. however the new unwind code assumes signal > > entry pushes one extra token that just have to be popped from the > > shstk. this abi cannot be expanded which means > > > > 1) kernel cannot push more tokens for more integrity checks > > (or to add whatever other features) > > > > 2) sigaltshstk cannot work. > > > > if the unwinder instead interprets the token to be the old ssp and > > either incssp or switch to that ssp (depending on continous or > > discontinous shstk, which the unwinder can detect), then 1) and 2) > > are fixed. > > > > but currently the distributed unwinder binary is incompatible with > > 1) and 2) so sigaltshstk cannot be added later. breaking the unwind > > abi is not acceptable. > > Can you point me to what you are talking about? I tested adding fields > to the shadow stack on top of these changes. It worked with HJ's new > (unposted I think) C++ changes for gcc. Adding fields doesn't work with > the existing gcc because it assumes a fixed size. if there is a fix that's good, i haven't seen it. my point was that the current unwinder works with current kernel patches, but does not allow future extensions which prevents sigaltshstk to work. the unwinder is not versioned so this cannot be fixed later. it only works if distros ensure shstk is disabled until the unwinder is fixed. (however there is no way to detect old unwinder if somebody builds gcc from source.) also note that there is generic code in the unwinder that will deal with this and likely the x86 patches will conflict with arm and riscv etc patches that try to fix the same issue.. so posting patches on the tools side of the abi would be useful at this point. > > > The problem is that gcc expects a fixed 8 byte sized shadow stack > > > signal frame. The format in these patches is such that it can be > > > expanded for the sake of supporting alt shadow stack later, but it > > > happens to be a fixed 8 bytes for now, so it will work seamlessly > > > with > > > these old gcc's. HJ has some patches to fix GCC to jump over a > > > dynamically sized shadow stack signal frame, but this of course > > > won't > > > stop old gcc's from generating binaries that won't work with an > > > expanded frame. > > > > > > I was waffling on whether it would be better to pad the shadow > > > stack > > > [1] signal frame to start, this would break compatibility with any > > > binaries that use this -fnon-call-exceptions feature (if there are > > > any), but would set us up better for the future if we got away with > > > it. > > > > i don't see how -fnon-call-exceptions is relevant. > > It uses unwinder code that does assume a fixed shadow stack signal > frame size. Since gcc 8.5 I think. So these compilers will continue to > generate code that assumes a fixed frame size. This is one of the > limitations we have for not moving to a new elf bit. how does "fixed shadow stack signal frame size" relates to "-fnon-call-exceptions"? if there were instruction boundaries within a function where the ret addr is not yet pushed or already poped from the shstk then the flag would be relevant, but since push/pop happens atomically at function entry/return -fnon-call-exceptions makes no difference as far as shstk unwinding is concerned. > > you can unwind from a signal handler (this is not a c++ question > > but unwind abi question) and in practice eh works e.g. if the > > signal is raised (sync or async) in a frame where there are no > > cleanup handlers registered. in practice code rarely relies on > > this (because it's not valid in c++). the main user of this i > > know of is the glibc cancellation implmentation. (that is special > > in that it never catches the exception so ssp does not have to be > > updated for things to work, but in principle the unwinder should > > still verify the entries on shstk, otherwise the security > > guarantees are broken and the cleanup handlers can be hijacked. > > there are glibc abi issues that prevent fixing this, but in other > > libcs this may be still relevant). > > I'm not fully sure what you are trying to say here. The glibc shadow you mentioned -fnon-call-exceptions and i'm saying even without that unwinding from signal handler is relevant and has to track shstk and ideally even update it for eh control transfer (i.e. properly switch to a different shstk in case of discontinous shstk). > stack stuff that is there today supports unwinding through a signal > handler. The longjmp code (unlike fnon-call-exections) doesn't look at > the shstk signal frame. It just does INCSSP until it reaches its > desired SSP, not caring what it is INCSSPing over. x86 longjmp has differnet problems (cannot handle discontinous shstk now). glibc cancellation is a mix of unwinding and special longjmp and it is currently broken in that the unwind bit cannot verify the return addresses. the unwinder does control transfer to cleanup handlers so control flow hijack is possible in principle on a corrupt stack (though i don't think cancellation is a practical attack surface). > > longjmp can support discontinous shadow stack without wrss. > > the current code proposed to glibc does not, which is wrong > > (it breaks altshstk and green thread users like qemu for no > > good reason). > > > > declaring things unsupported means you have to go around to > > audit and mark binaries accordingly. > > The idea that all apps can be supported without auditing has been > assumed to be impossible by everyone I've talked to, including the > GLIBC developers deeply versed in the architectural limitations of this > feature. So if you have a magic solution, then that is a notable claim > and I think you should propose it instead of just alluding to the fact > that there is one. there is no magic, longjmp should be implemented as: target_ssp = read from jmpbuf; current_ssp = read ssp; for (p = target_ssp; p != current_ssp; p--) { if (*p == restore-token) { // target_ssp is on a different shstk. switch_shstk_to(p); break; } } for (; p != target_ssp; p++) // ssp is now on the same shstk as target. inc_ssp(); this is what setcontext is doing and longjmp can do the same: for programs that always longjmp within the same shstk the first loop is just p = current_ssp, but it also works when longjmp target is on a different shstk assuming nothing is running on that shstk, which is only possible if there is a restore token on top. this implies if the kernel switches shstk on signal entry it has to add a restore-token on the switched away shstk. > The only non-WRSS "longjmp from an alt shadow stack solution" that I > can think of would have something like a new syscall performing some > limited shadow stack actions normally prohibited in userspace by the there is setcontext and swapcontext already doing an shstk switch, i don't see why you think longjmp is different and needs magic syscalls or wrss. > architecture. We'd have to think through how this would impact the > security. There are a lot of security/compatibility tradeoffs to parse > in this. So also, just because something can be done, doesn't mean we > should do it. I think the philosophy at this point is, lets get the > basics working that can support most apps, and learn more about stuff > like where this bar is in the real world. i think longjmp should really be discussed with libc devs, not on the kernel list, since they know the practical constraints and trade-offs better. however longjmp is relevant for the signal abi design so it's not ideal to push a linux abi and then have the libc side discussion later..
On Tue, 2023-06-20 at 10:17 +0100, szabolcs.nagy@arm.com wrote: > if there is a fix that's good, i haven't seen it. > > my point was that the current unwinder works with current kernel > patches, but does not allow future extensions which prevents > sigaltshstk to work. the unwinder is not versioned so this cannot > be fixed later. it only works if distros ensure shstk is disabled > until the unwinder is fixed. (however there is no way to detect > old unwinder if somebody builds gcc from source.) This is a problem the kernel is having to deal with, not causing. The userspace changes were upstreamed before the kernel. Userspace folks are adamantly against moving to a new elf bit, to start over with a clean slate. I tried everything to influence this and was not successful. So I'm still not sure what the proposal here is for the kernel. I am guessing that the fnon-call-exceptions/expanded frame size incompatibilities could end up causing something to grow an opt-in at some point. > > also note that there is generic code in the unwinder that will > deal with this and likely the x86 patches will conflict with > arm and riscv etc patches that try to fix the same issue.. > so posting patches on the tools side of the abi would be useful > at this point. The glibc patches are unfortunately mostly upstream already. See HJ for the diff that targets the new enabling interface. From lessons learned earlier in this effort, he was not going push those changes before the kernel support was upstream. There shouldn't be any glibc changes to signal or longjmp stuff in those AFAIK though. [ snip ] > how does "fixed shadow stack signal frame size" relates to > "-fnon-call-exceptions"? > > if there were instruction boundaries within a function where the > ret addr is not yet pushed or already poped from the shstk then > the flag would be relevant, but since push/pop happens atomically > at function entry/return -fnon-call-exceptions makes no > difference as far as shstk unwinding is concerned. As I said, the existing unwinding code for fnon-call-excecptions assumes a fixed shadow stack signal frame size of 8 bytes. Since the exception is thrown out of a signal, it needs to know how to unwind through the shadow stack signal frame. [ snip ] > there is no magic, longjmp should be implemented as: > > target_ssp = read from jmpbuf; > current_ssp = read ssp; > for (p = target_ssp; p != current_ssp; p--) { > if (*p == restore-token) { > // target_ssp is on a different shstk. > switch_shstk_to(p); > break; > } > } > for (; p != target_ssp; p++) > // ssp is now on the same shstk as target. > inc_ssp(); > > this is what setcontext is doing and longjmp can do the same: > for programs that always longjmp within the same shstk the first > loop is just p = current_ssp, but it also works when longjmp > target is on a different shstk assuming nothing is running on > that shstk, which is only possible if there is a restore token > on top. > > this implies if the kernel switches shstk on signal entry it has > to add a restore-token on the switched away shstk. I actually did a POC for this, but rejected it. The problem is, if there is a shadow stack overflow at that point then the kernel can't push the shadow stack token to the old stack. And shadow stack overflow is exactly the alt shadow stack use case. So it doesn't really solve the problem. This reasoning was actually elaborated on when the alt shadow stack patches were posted. And it looks like I previously pointed you at it. This history here is quite long and complicated, but I’ve done my best to summarize it in the coverletters. It would be helpful if you could review those links. [ snip ] > i think longjmp should really be discussed with libc devs, > not on the kernel list, since they know the practical > constraints and trade-offs better. however longjmp is > relevant for the signal abi design so it's not ideal to > push a linux abi and then have the libc side discussion > later.. It sounds like you are aware of the limitations the pre-existing upstream userspace places on the shadow stack signal frame. We also previously discussed how the kernel had to work around other aspects of upstream userspace that assumed undecided kernel ABI. How on earth are you getting that the kernel ABI is being pushed before input from the userspace side? The situation is the opposite.
The 06/20/2023 19:34, Edgecombe, Rick P wrote: > On Tue, 2023-06-20 at 10:17 +0100, szabolcs.nagy@arm.com wrote: > > if there is a fix that's good, i haven't seen it. > > > > my point was that the current unwinder works with current kernel > > patches, but does not allow future extensions which prevents > > sigaltshstk to work. the unwinder is not versioned so this cannot > > be fixed later. it only works if distros ensure shstk is disabled > > until the unwinder is fixed. (however there is no way to detect > > old unwinder if somebody builds gcc from source.) > > This is a problem the kernel is having to deal with, not causing. The > userspace changes were upstreamed before the kernel. Userspace folks > are adamantly against moving to a new elf bit, to start over with a > clean slate. I tried everything to influence this and was not > successful. So I'm still not sure what the proposal here is for the > kernel. i agree, the glibc and libgcc patches should not have been accepted before a linux abi. but the other direction also holds: the linux patches should not be pushed before the userspace design is discussed. (the current code upstream is wrong, and new code for the proposed linux abi is not posted yet. this is not your fault, i'm saying it here, because the discussion is here.) > I am guessing that the fnon-call-exceptions/expanded frame size > incompatibilities could end up causing something to grow an opt-in at > some point. there are independent userspace components and not every component has a chance to opt-in. > > how does "fixed shadow stack signal frame size" relates to > > "-fnon-call-exceptions"? > > > > if there were instruction boundaries within a function where the > > ret addr is not yet pushed or already poped from the shstk then > > the flag would be relevant, but since push/pop happens atomically > > at function entry/return -fnon-call-exceptions makes no > > difference as far as shstk unwinding is concerned. > > As I said, the existing unwinding code for fnon-call-excecptions > assumes a fixed shadow stack signal frame size of 8 bytes. Since the > exception is thrown out of a signal, it needs to know how to unwind > through the shadow stack signal frame. sorry but there is some misunderstanding about -fnon-call-exceptions. it is for emitting cleanup and exception handler data for a function such that throwing from certain instructions within that function works, while normally only throwing from calls work. it is not about *unwinding* from an async signal handler, which is -fasynchronous-unwind-tables and should always work on linux, nor for dealing with cleanup/exception handlers above the interrupted frame (likewise it works on linux without special cflags). as far as i can tell the current unwinder handles shstk unwinding correctly across signal handlers (sync or async and cleanup/exceptions handlers too), i see no issue with "fixed shadow stack signal frame size of 8 bytes" other than future extensions and discontinous shstk. > > there is no magic, longjmp should be implemented as: > > > > target_ssp = read from jmpbuf; > > current_ssp = read ssp; > > for (p = target_ssp; p != current_ssp; p--) { > > if (*p == restore-token) { > > // target_ssp is on a different shstk. > > switch_shstk_to(p); > > break; > > } > > } > > for (; p != target_ssp; p++) > > // ssp is now on the same shstk as target. > > inc_ssp(); > > > > this is what setcontext is doing and longjmp can do the same: > > for programs that always longjmp within the same shstk the first > > loop is just p = current_ssp, but it also works when longjmp > > target is on a different shstk assuming nothing is running on > > that shstk, which is only possible if there is a restore token > > on top. > > > > this implies if the kernel switches shstk on signal entry it has > > to add a restore-token on the switched away shstk. > > I actually did a POC for this, but rejected it. The problem is, if > there is a shadow stack overflow at that point then the kernel can't > push the shadow stack token to the old stack. And shadow stack overflow > is exactly the alt shadow stack use case. So it doesn't really solve > the problem. the restore token in the alt shstk case does not regress anything but makes some use-cases work. alt shadow stack is important if code tries to jump in and out of signal handlers (dosemu does this with swapcontext) and for that a restore token is needed. alt shadow stack is important if the original shstk did not overflow but the signal handler would overflow it (small thread stack, huge sigaltstack case). alt shadow stack is also important for crash reporting on shstk overflow even if longjmp does not work then. longjmp to a makecontext stack would still work and longjmp back to the original stack can be made to mostly work by an altshstk option to overwrite the top entry with a restore token on overflow (this can break unwinding though).
On Wed, 2023-06-21 at 12:36 +0100, szabolcs.nagy@arm.com wrote: > > The 06/20/2023 19:34, Edgecombe, Rick P wrote: > > > > On Tue, 2023-06-20 at 10:17 +0100, szabolcs.nagy@arm.com wrote: > > > > > > if there is a fix that's good, i haven't seen it. > > > > > > > > > > > > my point was that the current unwinder works with current > > > > > > kernel > > > > > > patches, but does not allow future extensions which > > > > > > prevents > > > > > > sigaltshstk to work. the unwinder is not versioned so this > > > > > > cannot > > > > > > be fixed later. it only works if distros ensure shstk is > > > > > > disabled > > > > > > until the unwinder is fixed. (however there is no way to > > > > > > detect > > > > > > old unwinder if somebody builds gcc from source.) > > > > > > > > This is a problem the kernel is having to deal with, not > > > > causing. > > The > > > > userspace changes were upstreamed before the kernel. Userspace > > > > > > folks > > > > are adamantly against moving to a new elf bit, to start over > > > > with a > > > > clean slate. I tried everything to influence this and was not > > > > successful. So I'm still not sure what the proposal here is for > > > > the > > > > kernel. > > > > i agree, the glibc and libgcc patches should not have been accepted > > before a linux abi. > > > > but the other direction also holds: the linux patches should not be > > pushed before the userspace design is discussed. (the current code > > upstream is wrong, and new code for the proposed linux abi is not > > posted yet. this is not your fault, i'm saying it here, because the > > discussion is here.) This series has been discussed with glibc/gcc developers regularly throughout the enabling effort. In fact there have been ongoing discussions about future shadow stack functionality. It's not like this feature has been a fast or hidden effort. You are just walking into the tail end of it. (much of it predates my involvement BTW, including the initial glibc support) AFAIK HJ presented the enabling changes at some glibc meeting. The signal side of glibc is unchanged from what is already upstream. So I'm not sure characterizing it that way is fair. It seems you were not part of those old discussions, but that might be because your interest is new. In any case we are constrained by some of these earlier outcomes. More on that below. > > > > > > I am guessing that the fnon-call-exceptions/expanded frame size > > > > incompatibilities could end up causing something to grow an > > > > opt-in > > at > > > > some point. > > > > there are independent userspace components and not every component > > has a chance to opt-in. > > > > > > > > how does "fixed shadow stack signal frame size" relates to > > > > > > "-fnon-call-exceptions"? > > > > > > > > > > > > if there were instruction boundaries within a function > > > > > > where the > > > > > > ret addr is not yet pushed or already poped from the shstk > > > > > > then > > > > > > the flag would be relevant, but since push/pop happens > > > > > > atomically > > > > > > at function entry/return -fnon-call-exceptions makes no > > > > > > difference as far as shstk unwinding is concerned. > > > > > > > > As I said, the existing unwinding code for fnon-call- > > > > excecptions > > > > assumes a fixed shadow stack signal frame size of 8 bytes. > > > > Since > > the > > > > exception is thrown out of a signal, it needs to know how to > > > > unwind > > > > through the shadow stack signal frame. > > > > sorry but there is some misunderstanding about -fnon-call- > > exceptions. > > > > it is for emitting cleanup and exception handler data for a > > function > > such that throwing from certain instructions within that function > > works, while normally only throwing from calls work. > > > > it is not about *unwinding* from an async signal handler, which is > > -fasynchronous-unwind-tables and should always work on linux, nor > > for > > dealing with cleanup/exception handlers above the interrupted frame > > (likewise it works on linux without special cflags). > > > > as far as i can tell the current unwinder handles shstk unwinding > > correctly across signal handlers (sync or async and > > > cleanup/exceptions > > handlers too), i see no issue with "fixed shadow stack signal frame > > size of 8 bytes" other than future extensions and discontinous > > shstk. HJ, can you link your patch that makes it extensible and we can clear this up? Maybe the issue extends beyond fnon-call-exceptions, but that is where I reproduced it. > > > > > > > > there is no magic, longjmp should be implemented as: > > > > > > > > > > > > target_ssp = read from jmpbuf; > > > > > > current_ssp = read ssp; > > > > > > for (p = target_ssp; p != current_ssp; p--) { > > > > > > if (*p == restore-token) { > > > > > > // target_ssp is on a different > > > > > > shstk. > > > > > > switch_shstk_to(p); > > > > > > break; > > > > > > } > > > > > > } > > > > > > for (; p != target_ssp; p++) > > > > > > // ssp is now on the same shstk as target. > > > > > > inc_ssp(); > > > > > > > > > > > > this is what setcontext is doing and longjmp can do the > > > > > > same: > > > > > > for programs that always longjmp within the same shstk the > > > > > > first > > > > > > loop is just p = current_ssp, but it also works when > > > > > > longjmp > > > > > > target is on a different shstk assuming nothing is running > > > > > > on > > > > > > that shstk, which is only possible if there is a restore > > > > > > token > > > > > > on top. > > > > > > > > > > > > this implies if the kernel switches shstk on signal entry > > > > > > it has > > > > > > to add a restore-token on the switched away shstk. > > > > > > > > I actually did a POC for this, but rejected it. The problem is, > > > > if > > > > there is a shadow stack overflow at that point then the kernel > > > > > > can't > > > > push the shadow stack token to the old stack. And shadow stack > > > > > > overflow > > > > is exactly the alt shadow stack use case. So it doesn't really > > > > > > solve > > > > the problem. > > > > the restore token in the alt shstk case does not regress anything > > but > > makes some use-cases work. > > > > alt shadow stack is important if code tries to jump in and out of > > signal handlers (dosemu does this with swapcontext) and for that a > > restore token is needed. > > > > alt shadow stack is important if the original shstk did not > > overflow > > but the signal handler would overflow it (small thread stack, huge > > sigaltstack case). > > > > alt shadow stack is also important for crash reporting on shstk > > overflow even if longjmp does not work then. longjmp to a > > makecontext > > stack would still work and longjmp back to the original stack can > > be > > made to mostly work by an altshstk option to overwrite the top > > entry > > with a restore token on overflow (this can break unwinding though). > > There was previously a request to create an alt shadow stack for the purpose of handling shadow stack overflow. So you are now suggesting to to exclude that and instead target a different use case for alt shadow stack? But I'm not sure how much we should change the ABI at this point since we are constrained by existing userspace. If you read the history, we may end up needing to deprecate the whole elf bit for this and other reasons. So should we struggle to find a way to grow the existing ABI without disturbing the existing userspace? Or should we start with something, finally, and see where we need to grow and maybe get a chance at a fresh start to grow it? Like, maybe 3 people will show up saying "hey, I *really* need to use shadow stack and longjmp from a ucontext stack", and no one says anything about shadow stack overflow. Then we know what to do. And maybe dosemu decides it doesn't need to implement shadow stack (highly likely I would think). Now that I think about it, AFAIU SS_AUTODISARM was created for dosemu, and the alt shadow stack patch adopted this behavior. So it's speculation that there is even a problem in that scenario. Or maybe people just enable WRSS for longjmp() and directly jump back to the setjmp() point. Do most people want fast setjmp/longjmp() at the cost of a little security? Even if, with enough discussion, we could optimize for all hypotheticals without real user feedback, I don't see how it helps users to hold shadow stack. So I think we should move forward with the current ABI.
On Wed, 2023-06-21 at 11:54 -0700, Rick Edgecombe wrote: > > > > > > > there is no magic, longjmp should be implemented as: > > > > > > > > > > > > > > target_ssp = read from jmpbuf; > > > > > > > current_ssp = read ssp; > > > > > > > for (p = target_ssp; p != current_ssp; p--) { > > > > > > > if (*p == restore-token) { > > > > > > > // target_ssp is on a different > > > > > > > shstk. > > > > > > > switch_shstk_to(p); > > > > > > > break; > > > > > > > } > > > > > > > } > > > > > > > for (; p != target_ssp; p++) > > > > > > > // ssp is now on the same shstk as > > > > > > > target. > > > > > > > inc_ssp(); > > > > > > > > > > > > > > this is what setcontext is doing and longjmp can do the > > > > > > > same: > > > > > > > for programs that always longjmp within the same shstk > > > > > > > the > > > > > > > first > > > > > > > loop is just p = current_ssp, but it also works when > > > > > > > longjmp > > > > > > > target is on a different shstk assuming nothing is > > > > > > > running > > > > > > > on > > > > > > > that shstk, which is only possible if there is a restore > > > > > > > token > > > > > > > on top. > > > > > > > > > > > > > > this implies if the kernel switches shstk on signal entry > > > > > > > it has > > > > > > > to add a restore-token on the switched away shstk. Wait a second, the claim is that the kernel should add a restore token on the current shadow stack before handling a signal, to allow to unwind from an alt shadow stack, right? But in this series there is not an alt shadow stack, so signal will be handled on the current shadow stack. If the user stays on the current shadow stack, the existing simple INCSSP based solution will work. If the user swapcontext()'s away while handling a signal (which *is* currently supported) they will leave their own restore token on the old stack. Hypothetically glibc could unwind back through a series of ucontext stacks by pivoting, if it kept some metadata somewhere about where to restore to. So there are actually already enough tokens to make it back in this case, glibc just doesn't do this. But how does the proposed token placed by the kernel on the original stack help this problem? The longjmp() would have to be able to find the location of the restore tokens somehow, which would not necessarily be near the setjmp() point. The signal token could even be on a different shadow stack. So I think the above is short of a design for a universally compatible longjmp(). Which makes me think if we did want to make a more compatible longjmp() a better the way to do it might be an arch_prctl that emits a token at the current SSP. This would be loosening up the security somewhat (have to be an opt-in), but less so then enabling WRSS. But it would also be way simpler, work for all cases (I think), and be faster (maybe?) than INCSSPing through a bunch of stacks. I'm also not sure leaving a token on signal doesn't weaken the security it it's own way as well. Any thread could then swap to that token. Where as the shadow stack signal frame ssp pointer can only be used from the shadow stack the signal was handled on. So I think, in addition to blocking the shadow stack overflow use case in the future, leaving a token behind on signal will not really help longjmp(). (or at least I'm not following)
On Wed, Jun 21, 2023 at 11:54 AM Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > > On Wed, 2023-06-21 at 12:36 +0100, szabolcs.nagy@arm.com wrote: > > > The 06/20/2023 19:34, Edgecombe, Rick P wrote: > > > > > On Tue, 2023-06-20 at 10:17 +0100, szabolcs.nagy@arm.com wrote: > > > > > > > if there is a fix that's good, i haven't seen it. > > > > > > > > > > > > > > my point was that the current unwinder works with current > > > > > > > kernel > > > > > > > patches, but does not allow future extensions which > > > > > > > prevents > > > > > > > sigaltshstk to work. the unwinder is not versioned so this > > > > > > > cannot > > > > > > > be fixed later. it only works if distros ensure shstk is > > > > > > > disabled > > > > > > > until the unwinder is fixed. (however there is no way to > > > > > > > detect > > > > > > > old unwinder if somebody builds gcc from source.) > > > > > > > > > > This is a problem the kernel is having to deal with, not > > > > > causing. > > The > > > > > userspace changes were upstreamed before the kernel. Userspace > > > > > > > folks > > > > > are adamantly against moving to a new elf bit, to start over > > > > > with a > > > > > clean slate. I tried everything to influence this and was not > > > > > successful. So I'm still not sure what the proposal here is for > > > > > the > > > > > kernel. > > > > > > i agree, the glibc and libgcc patches should not have been accepted > > > before a linux abi. > > > > > > but the other direction also holds: the linux patches should not be > > > pushed before the userspace design is discussed. (the current code > > > upstream is wrong, and new code for the proposed linux abi is not > > > posted yet. this is not your fault, i'm saying it here, because the > > > discussion is here.) > > This series has been discussed with glibc/gcc developers regularly > throughout the enabling effort. In fact there have been ongoing > discussions about future shadow stack functionality. > > It's not like this feature has been a fast or hidden effort. You are > just walking into the tail end of it. (much of it predates my > involvement BTW, including the initial glibc support) > > AFAIK HJ presented the enabling changes at some glibc meeting. The > signal side of glibc is unchanged from what is already upstream. So I'm > not sure characterizing it that way is fair. It seems you were not part > of those old discussions, but that might be because your interest is > new. In any case we are constrained by some of these earlier outcomes. > More on that below. > > > > > > > > > I am guessing that the fnon-call-exceptions/expanded frame size > > > > > incompatibilities could end up causing something to grow an > > > > > opt-in > > at > > > > > some point. > > > > > > there are independent userspace components and not every component > > > has a chance to opt-in. > > > > > > > > > > how does "fixed shadow stack signal frame size" relates to > > > > > > > "-fnon-call-exceptions"? > > > > > > > > > > > > > > if there were instruction boundaries within a function > > > > > > > where the > > > > > > > ret addr is not yet pushed or already poped from the shstk > > > > > > > then > > > > > > > the flag would be relevant, but since push/pop happens > > > > > > > atomically > > > > > > > at function entry/return -fnon-call-exceptions makes no > > > > > > > difference as far as shstk unwinding is concerned. > > > > > > > > > > As I said, the existing unwinding code for fnon-call- > > > > > excecptions > > > > > assumes a fixed shadow stack signal frame size of 8 bytes. > > > > > Since > > the > > > > > exception is thrown out of a signal, it needs to know how to > > > > > unwind > > > > > through the shadow stack signal frame. > > > > > > sorry but there is some misunderstanding about -fnon-call- > > > exceptions. > > > > > > it is for emitting cleanup and exception handler data for a > > > function > > > such that throwing from certain instructions within that function > > > works, while normally only throwing from calls work. > > > > > > it is not about *unwinding* from an async signal handler, which is > > > -fasynchronous-unwind-tables and should always work on linux, nor > > > for > > > dealing with cleanup/exception handlers above the interrupted frame > > > (likewise it works on linux without special cflags). > > > > > > as far as i can tell the current unwinder handles shstk unwinding > > > correctly across signal handlers (sync or async and > > > > cleanup/exceptions > > > handlers too), i see no issue with "fixed shadow stack signal frame > > > size of 8 bytes" other than future extensions and discontinous > > > shstk. > > HJ, can you link your patch that makes it extensible and we can clear > this up? Maybe the issue extends beyond fnon-call-exceptions, but that > is where I reproduced it. Here is the patch: https://gitlab.com/x86-gcc/gcc/-/commit/aab4c24b67b5f05b72e52a3eaae005c2277710b9 > > > > > > > > > > there is no magic, longjmp should be implemented as: > > > > > > > > > > > > > > target_ssp = read from jmpbuf; > > > > > > > current_ssp = read ssp; > > > > > > > for (p = target_ssp; p != current_ssp; p--) { > > > > > > > if (*p == restore-token) { > > > > > > > // target_ssp is on a different > > > > > > > shstk. > > > > > > > switch_shstk_to(p); > > > > > > > break; > > > > > > > } > > > > > > > } > > > > > > > for (; p != target_ssp; p++) > > > > > > > // ssp is now on the same shstk as target. > > > > > > > inc_ssp(); > > > > > > > > > > > > > > this is what setcontext is doing and longjmp can do the > > > > > > > same: > > > > > > > for programs that always longjmp within the same shstk the > > > > > > > first > > > > > > > loop is just p = current_ssp, but it also works when > > > > > > > longjmp > > > > > > > target is on a different shstk assuming nothing is running > > > > > > > on > > > > > > > that shstk, which is only possible if there is a restore > > > > > > > token > > > > > > > on top. > > > > > > > > > > > > > > this implies if the kernel switches shstk on signal entry > > > > > > > it has > > > > > > > to add a restore-token on the switched away shstk. > > > > > > > > > > I actually did a POC for this, but rejected it. The problem is, > > > > > if > > > > > there is a shadow stack overflow at that point then the kernel > > > > > > > can't > > > > > push the shadow stack token to the old stack. And shadow stack > > > > > > > overflow > > > > > is exactly the alt shadow stack use case. So it doesn't really > > > > > > > solve > > > > > the problem. > > > > > > the restore token in the alt shstk case does not regress anything > > > but > > > makes some use-cases work. > > > > > > alt shadow stack is important if code tries to jump in and out of > > > signal handlers (dosemu does this with swapcontext) and for that a > > > restore token is needed. > > > > > > alt shadow stack is important if the original shstk did not > > > overflow > > > but the signal handler would overflow it (small thread stack, huge > > > sigaltstack case). > > > > > > alt shadow stack is also important for crash reporting on shstk > > > overflow even if longjmp does not work then. longjmp to a > > > makecontext > > > stack would still work and longjmp back to the original stack can > > > be > > > made to mostly work by an altshstk option to overwrite the top > > > entry > > > with a restore token on overflow (this can break unwinding though). > > > > > There was previously a request to create an alt shadow stack for the > purpose of handling shadow stack overflow. So you are now suggesting to > to exclude that and instead target a different use case for alt shadow > stack? > > But I'm not sure how much we should change the ABI at this point since > we are constrained by existing userspace. If you read the history, we > may end up needing to deprecate the whole elf bit for this and other > reasons. > > So should we struggle to find a way to grow the existing ABI without > disturbing the existing userspace? Or should we start with something, > finally, and see where we need to grow and maybe get a chance at a > fresh start to grow it? > > Like, maybe 3 people will show up saying "hey, I *really* need to use > shadow stack and longjmp from a ucontext stack", and no one says > anything about shadow stack overflow. Then we know what to do. And > maybe dosemu decides it doesn't need to implement shadow stack (highly > likely I would think). Now that I think about it, AFAIU SS_AUTODISARM > was created for dosemu, and the alt shadow stack patch adopted this > behavior. So it's speculation that there is even a problem in that > scenario. > > Or maybe people just enable WRSS for longjmp() and directly jump back > to the setjmp() point. Do most people want fast setjmp/longjmp() at the > cost of a little security? > > Even if, with enough discussion, we could optimize for all > hypotheticals without real user feedback, I don't see how it helps > users to hold shadow stack. So I think we should move forward with the > current ABI. > >
On Wed, Jun 21, 2023 at 3:23 PM Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > > On Wed, 2023-06-21 at 11:54 -0700, Rick Edgecombe wrote: > > > > > > > > there is no magic, longjmp should be implemented as: > > > > > > > > > > > > > > > > target_ssp = read from jmpbuf; > > > > > > > > current_ssp = read ssp; > > > > > > > > for (p = target_ssp; p != current_ssp; p--) { > > > > > > > > if (*p == restore-token) { > > > > > > > > // target_ssp is on a different > > > > > > > > shstk. > > > > > > > > switch_shstk_to(p); > > > > > > > > break; > > > > > > > > } > > > > > > > > } > > > > > > > > for (; p != target_ssp; p++) > > > > > > > > // ssp is now on the same shstk as > > > > > > > > target. > > > > > > > > inc_ssp(); > > > > > > > > > > > > > > > > this is what setcontext is doing and longjmp can do the > > > > > > > > same: > > > > > > > > for programs that always longjmp within the same shstk > > > > > > > > the > > > > > > > > first > > > > > > > > loop is just p = current_ssp, but it also works when > > > > > > > > longjmp > > > > > > > > target is on a different shstk assuming nothing is > > > > > > > > running > > > > > > > > on > > > > > > > > that shstk, which is only possible if there is a restore > > > > > > > > token > > > > > > > > on top. > > > > > > > > > > > > > > > > this implies if the kernel switches shstk on signal entry > > > > > > > > it has > > > > > > > > to add a restore-token on the switched away shstk. > > Wait a second, the claim is that the kernel should add a restore token > on the current shadow stack before handling a signal, to allow to > unwind from an alt shadow stack, right? But in this series there is not > an alt shadow stack, so signal will be handled on the current shadow > stack. If the user stays on the current shadow stack, the existing > simple INCSSP based solution will work. > > If the user swapcontext()'s away while handling a signal (which *is* > currently supported) they will leave their own restore token on the old > stack. Hypothetically glibc could unwind back through a series of > ucontext stacks by pivoting, if it kept some metadata somewhere about > where to restore to. So there are actually already enough tokens to > make it back in this case, glibc just doesn't do this. > > But how does the proposed token placed by the kernel on the original > stack help this problem? The longjmp() would have to be able to find > the location of the restore tokens somehow, which would not necessarily > be near the setjmp() point. The signal token could even be on a > different shadow stack. > > So I think the above is short of a design for a universally compatible > longjmp(). > > Which makes me think if we did want to make a more compatible longjmp() > a better the way to do it might be an arch_prctl that emits a token at > the current SSP. This would be loosening up the security somewhat (have > to be an opt-in), but less so then enabling WRSS. But it would also be > way simpler, work for all cases (I think), and be faster (maybe?) than > INCSSPing through a bunch of stacks. Since longjmp isn't required to be called after setjmp, leaving a restore token doesn't work when longjmp isn't called. > I'm also not sure leaving a token on signal doesn't weaken the security > it it's own way as well. Any thread could then swap to that token. > Where as the shadow stack signal frame ssp pointer can only be used > from the shadow stack the signal was handled on. > > So I think, in addition to blocking the shadow stack overflow use case > in the future, leaving a token behind on signal will not really help > longjmp(). (or at least I'm not following) >
On Wed, 2023-06-21 at 16:05 -0700, H.J. Lu wrote: > > Which makes me think if we did want to make a more compatible > > longjmp() > > a better the way to do it might be an arch_prctl that emits a token > > at > > the current SSP. This would be loosening up the security somewhat > > (have > > to be an opt-in), but less so then enabling WRSS. But it would also > > be > > way simpler, work for all cases (I think), and be faster (maybe?) > > than > > INCSSPing through a bunch of stacks. > > Since longjmp isn't required to be called after setjmp, leaving a > restore > token doesn't work when longjmp isn't called. Oh good point. Hmm.
On Wed, 2023-06-21 at 16:15 -0700, Rick Edgecombe wrote: > On Wed, 2023-06-21 at 16:05 -0700, H.J. Lu wrote: > > > Which makes me think if we did want to make a more compatible > > > longjmp() > > > a better the way to do it might be an arch_prctl that emits a > > > token > > > at > > > the current SSP. This would be loosening up the security somewhat > > > (have > > > to be an opt-in), but less so then enabling WRSS. But it would > > > also > > > be > > > way simpler, work for all cases (I think), and be faster (maybe?) > > > than > > > INCSSPing through a bunch of stacks. > > > > Since longjmp isn't required to be called after setjmp, leaving a > > restore > > token doesn't work when longjmp isn't called. > > Oh good point. Hmm. Just had a quick chat with HJ on this. It seems like it *might* be able to made to work. How it would go is setjmp() could act as a wrapper by calling it's own return address (the function that called setjmp()). This would mean in the case of longjmp() not being called, control flow would return through setjmp() before returning from the calling method. This would allow libc to do a RSTORSSP when returning though setjmp() in the non-shadow stack case, and essentially skip over the kernel placed restore token, and then return from setjmp() like normal. In the case of longjmp() being called, it could RSTORSSP directly to the token, and then return from setjmp(). Another option could be getting the compilers help to do the RSTORSSP in the case of longjmp() not being called. Apparently compilers are aware of setjmp() and already do special things around it (makes sense I guess, but news to me). And also, this all would actually work with IBT, because the compiler knows already to add an endbr at that point right after setjmp(). I think neither of us were ready to bet on it, but thought maybe it could work. And even if it works it's much more complicated than I first thought, so I don't like it as much. It's also unclear what a change like that would mean for security. As for unwinding through the existing swapcontext() placed restore tokens, the problem was as assumed - that it's difficult to find them. Even considering brute force options like doing manual searches for a nearby token to use turned up edge cases pretty quick. So I think that kind of leaves us where we were originally, with no known solutions that would require breaking kernel ABI changes. Are you interested in helping get longjmp() from a ucontext stack working for shadow stack? One other thing that came up in the conversation was that while it is known that some apps are doing this, there are no tests for mixing longjmp and ucontext in glibc. So we may not know which combinations of mixing them together even work in the non-shadow stack case. It could be useful to add some tests for this to glibc and we could get some clarity on what behaviors shadow stack would actually need to support.
On Wed, Jun 21, 2023 at 6:07 PM Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > > On Wed, 2023-06-21 at 16:15 -0700, Rick Edgecombe wrote: > > On Wed, 2023-06-21 at 16:05 -0700, H.J. Lu wrote: > > > > Which makes me think if we did want to make a more compatible > > > > longjmp() > > > > a better the way to do it might be an arch_prctl that emits a > > > > token > > > > at > > > > the current SSP. This would be loosening up the security somewhat > > > > (have > > > > to be an opt-in), but less so then enabling WRSS. But it would > > > > also > > > > be > > > > way simpler, work for all cases (I think), and be faster (maybe?) > > > > than > > > > INCSSPing through a bunch of stacks. > > > > > > Since longjmp isn't required to be called after setjmp, leaving a > > > restore > > > token doesn't work when longjmp isn't called. > > > > Oh good point. Hmm. > > Just had a quick chat with HJ on this. It seems like it *might* be able > to made to work. How it would go is setjmp() could act as a wrapper by > calling it's own return address (the function that called setjmp()). > This would mean in the case of longjmp() not being called, control flow > would return through setjmp() before returning from the calling method. It may not work since we can't tell if RAX (return value) is set by longjmp or function return. > This would allow libc to do a RSTORSSP when returning though setjmp() > in the non-shadow stack case, and essentially skip over the kernel > placed restore token, and then return from setjmp() like normal. In the > case of longjmp() being called, it could RSTORSSP directly to the > token, and then return from setjmp(). > > Another option could be getting the compilers help to do the RSTORSSP > in the case of longjmp() not being called. Apparently compilers are > aware of setjmp() and already do special things around it (makes sense > I guess, but news to me). > > And also, this all would actually work with IBT, because the compiler > knows already to add an endbr at that point right after setjmp(). > > I think neither of us were ready to bet on it, but thought maybe it > could work. And even if it works it's much more complicated than I > first thought, so I don't like it as much. It's also unclear what a > change like that would mean for security. > > As for unwinding through the existing swapcontext() placed restore > tokens, the problem was as assumed - that it's difficult to find them. > Even considering brute force options like doing manual searches for a > nearby token to use turned up edge cases pretty quick. So I think that > kind of leaves us where we were originally, with no known solutions > that would require breaking kernel ABI changes. > > > Are you interested in helping get longjmp() from a ucontext stack > working for shadow stack? One other thing that came up in the > conversation was that while it is known that some apps are doing this, > there are no tests for mixing longjmp and ucontext in glibc. So we may > not know which combinations of mixing them together even work in the > non-shadow stack case. > > It could be useful to add some tests for this to glibc and we could get > some clarity on what behaviors shadow stack would actually need to > support.
The 06/21/2023 16:02, H.J. Lu wrote: > On Wed, Jun 21, 2023 at 11:54 AM Edgecombe, Rick P > <rick.p.edgecombe@intel.com> wrote: > > HJ, can you link your patch that makes it extensible and we can clear > > this up? Maybe the issue extends beyond fnon-call-exceptions, but that > > is where I reproduced it. > > Here is the patch: > > https://gitlab.com/x86-gcc/gcc/-/commit/aab4c24b67b5f05b72e52a3eaae005c2277710b9 ok i don't see how this is related to fnon-call-exceptions.. but it is wrong if the shstk is ever discontinous even though the shstk format would allow that. this was my original point in this thread: with current unwinder code you cannot add altshstk later. and no, introducing new binary markings and rebuild the world just for altshstk is not OK. so we have to decide now if we want alt shstk in the future or not, i gave reasons why we would want it.
The 06/21/2023 22:22, Edgecombe, Rick P wrote: > On Wed, 2023-06-21 at 11:54 -0700, Rick Edgecombe wrote: > > > > > > > > there is no magic, longjmp should be implemented as: > > > > > > > > > > > > > > > > target_ssp = read from jmpbuf; > > > > > > > > current_ssp = read ssp; > > > > > > > > for (p = target_ssp; p != current_ssp; p--) { > > > > > > > > if (*p == restore-token) { > > > > > > > > // target_ssp is on a different > > > > > > > > shstk. > > > > > > > > switch_shstk_to(p); > > > > > > > > break; > > > > > > > > } > > > > > > > > } > > > > > > > > for (; p != target_ssp; p++) > > > > > > > > // ssp is now on the same shstk as > > > > > > > > target. > > > > > > > > inc_ssp(); > > > > > > > > > > > > > > > > this is what setcontext is doing and longjmp can do the > > > > > > > > same: > > > > > > > > for programs that always longjmp within the same shstk > > > > > > > > the > > > > > > > > first > > > > > > > > loop is just p = current_ssp, but it also works when > > > > > > > > longjmp > > > > > > > > target is on a different shstk assuming nothing is > > > > > > > > running > > > > > > > > on > > > > > > > > that shstk, which is only possible if there is a restore > > > > > > > > token > > > > > > > > on top. > > > > > > > > > > > > > > > > this implies if the kernel switches shstk on signal entry > > > > > > > > it has > > > > > > > > to add a restore-token on the switched away shstk. > > Wait a second, the claim is that the kernel should add a restore token > on the current shadow stack before handling a signal, to allow to > unwind from an alt shadow stack, right? But in this series there is not > an alt shadow stack, so signal will be handled on the current shadow > stack. If the user stays on the current shadow stack, the existing > simple INCSSP based solution will work. yes. > If the user swapcontext()'s away while handling a signal (which *is* > currently supported) they will leave their own restore token on the old > stack. Hypothetically glibc could unwind back through a series of > ucontext stacks by pivoting, if it kept some metadata somewhere about > where to restore to. So there are actually already enough tokens to > make it back in this case, glibc just doesn't do this. swapcontext is currently *not* supported: for it to work you have to be able to jump *back* into the signal handler, which does not work if the swapcontext target is on the original thread stack (instead of say a makecontext stack). jumping back can only be supported if alt stack can be paired with an alt shadow stack. unwinding across a series of signal interrupts should work even with discontinous shstk. libgcc does not implement this which is a problem i think. > But how does the proposed token placed by the kernel on the original > stack help this problem? The longjmp() would have to be able to find > the location of the restore tokens somehow, which would not necessarily > be near the setjmp() point. The signal token could even be on a > different shadow stack. i posted the exact longjmp code and it takes care of this case. setjmp does not need to do anything special. the invariant is that an shstk is either capped by a restore token or in use by some executing task. this is guaranteed architecturally (when shstk is switched with an instruction) and should be guaranteed by the kernel too (when shstk is switched by the kernel). > I'm also not sure leaving a token on signal doesn't weaken the security > it it's own way as well. Any thread could then swap to that token. > Where as the shadow stack signal frame ssp pointer can only be used > from the shadow stack the signal was handled on. as far as i'm concerned it is a valid programming model to switch to a stack that is currently not in use and we should always allow that. (signal handled on an alt stack may not return) > So I think, in addition to blocking the shadow stack overflow use case > in the future, leaving a token behind on signal will not really help > longjmp(). (or at least I'm not following) the restore token must only be added if shstk is switched (currently it is not switched so don't add it, however if we agree on this then the unwinder can be fixed accordingly.)
The 06/21/2023 18:54, Edgecombe, Rick P wrote: > On Wed, 2023-06-21 at 12:36 +0100, szabolcs.nagy@arm.com wrote: > > > The 06/20/2023 19:34, Edgecombe, Rick P wrote: > > > > > I actually did a POC for this, but rejected it. The problem is, > > > > > if > > > > > there is a shadow stack overflow at that point then the kernel > > > > > > > can't > > > > > push the shadow stack token to the old stack. And shadow stack > > > > > > > overflow > > > > > is exactly the alt shadow stack use case. So it doesn't really > > > > > > > solve > > > > > the problem. > > > > > > the restore token in the alt shstk case does not regress anything > > > but > > > makes some use-cases work. > > > > > > alt shadow stack is important if code tries to jump in and out of > > > signal handlers (dosemu does this with swapcontext) and for that a > > > restore token is needed. > > > > > > alt shadow stack is important if the original shstk did not > > > overflow > > > but the signal handler would overflow it (small thread stack, huge > > > sigaltstack case). > > > > > > alt shadow stack is also important for crash reporting on shstk > > > overflow even if longjmp does not work then. longjmp to a > > > makecontext > > > stack would still work and longjmp back to the original stack can > > > be > > > made to mostly work by an altshstk option to overwrite the top > > > entry > > > with a restore token on overflow (this can break unwinding though). > > > > > There was previously a request to create an alt shadow stack for the > purpose of handling shadow stack overflow. So you are now suggesting to > to exclude that and instead target a different use case for alt shadow > stack? that is not what i said. > But I'm not sure how much we should change the ABI at this point since > we are constrained by existing userspace. If you read the history, we > may end up needing to deprecate the whole elf bit for this and other > reasons. i'm not against deprecating the elf bit, but i think binary marking will be difficult for this kind of feature no matter what (code may be incompatible for complex runtime dependent reasons). > So should we struggle to find a way to grow the existing ABI without > disturbing the existing userspace? Or should we start with something, > finally, and see where we need to grow and maybe get a chance at a > fresh start to grow it? > > Like, maybe 3 people will show up saying "hey, I *really* need to use > shadow stack and longjmp from a ucontext stack", and no one says > anything about shadow stack overflow. Then we know what to do. And > maybe dosemu decides it doesn't need to implement shadow stack (highly > likely I would think). Now that I think about it, AFAIU SS_AUTODISARM > was created for dosemu, and the alt shadow stack patch adopted this > behavior. So it's speculation that there is even a problem in that > scenario. > > Or maybe people just enable WRSS for longjmp() and directly jump back > to the setjmp() point. Do most people want fast setjmp/longjmp() at the > cost of a little security? > > Even if, with enough discussion, we could optimize for all > hypotheticals without real user feedback, I don't see how it helps > users to hold shadow stack. So I think we should move forward with the > current ABI. you may not get a second chance to fix a security feature. it will be just disabled if it causes problems.
On Thu, Jun 22, 2023 at 2:28 AM szabolcs.nagy@arm.com <szabolcs.nagy@arm.com> wrote: > > The 06/21/2023 18:54, Edgecombe, Rick P wrote: > > On Wed, 2023-06-21 at 12:36 +0100, szabolcs.nagy@arm.com wrote: > > > > The 06/20/2023 19:34, Edgecombe, Rick P wrote: > > > > > > I actually did a POC for this, but rejected it. The problem is, > > > > > > if > > > > > > there is a shadow stack overflow at that point then the kernel > > > > > > > > can't > > > > > > push the shadow stack token to the old stack. And shadow stack > > > > > > > > overflow > > > > > > is exactly the alt shadow stack use case. So it doesn't really > > > > > > > > solve > > > > > > the problem. > > > > > > > > the restore token in the alt shstk case does not regress anything > > > > but > > > > makes some use-cases work. > > > > > > > > alt shadow stack is important if code tries to jump in and out of > > > > signal handlers (dosemu does this with swapcontext) and for that a > > > > restore token is needed. > > > > > > > > alt shadow stack is important if the original shstk did not > > > > overflow > > > > but the signal handler would overflow it (small thread stack, huge > > > > sigaltstack case). > > > > > > > > alt shadow stack is also important for crash reporting on shstk > > > > overflow even if longjmp does not work then. longjmp to a > > > > makecontext > > > > stack would still work and longjmp back to the original stack can > > > > be > > > > made to mostly work by an altshstk option to overwrite the top > > > > entry > > > > with a restore token on overflow (this can break unwinding though). > > > > > > > > There was previously a request to create an alt shadow stack for the > > purpose of handling shadow stack overflow. So you are now suggesting to > > to exclude that and instead target a different use case for alt shadow > > stack? > > that is not what i said. > > > But I'm not sure how much we should change the ABI at this point since > > we are constrained by existing userspace. If you read the history, we > > may end up needing to deprecate the whole elf bit for this and other > > reasons. > > i'm not against deprecating the elf bit, but i think binary > marking will be difficult for this kind of feature no matter what > (code may be incompatible for complex runtime dependent reasons). > > > So should we struggle to find a way to grow the existing ABI without > > disturbing the existing userspace? Or should we start with something, > > finally, and see where we need to grow and maybe get a chance at a > > fresh start to grow it? > > > > Like, maybe 3 people will show up saying "hey, I *really* need to use > > shadow stack and longjmp from a ucontext stack", and no one says > > anything about shadow stack overflow. Then we know what to do. And > > maybe dosemu decides it doesn't need to implement shadow stack (highly > > likely I would think). Now that I think about it, AFAIU SS_AUTODISARM > > was created for dosemu, and the alt shadow stack patch adopted this > > behavior. So it's speculation that there is even a problem in that > > scenario. > > > > Or maybe people just enable WRSS for longjmp() and directly jump back > > to the setjmp() point. Do most people want fast setjmp/longjmp() at the > > cost of a little security? > > > > Even if, with enough discussion, we could optimize for all > > hypotheticals without real user feedback, I don't see how it helps > > users to hold shadow stack. So I think we should move forward with the > > current ABI. > > you may not get a second chance to fix a security feature. > it will be just disabled if it causes problems. *I* would use altshadowstack. I run a production system (that cares about correctness *and* performance, but that's not really relevant here -- SHSTK ought to be fast). And, if it crashes, I want to know why. So I handle SIGSEGV, etc so I have good logs if it crashes. And I want those same logs if I overflow the stack. That being said, I have no need for longjmp or siglongjmp for this. I use exit(2) to escape. For what it's worth, setjmp/longjmp is a bad API. The actual pattern that ought to work well (and that could be supported well by fancy compilers and non-C languages, as I understand it) is more like a function call that has two ways out. Like this (pseudo-C): void function(struct better_jmp_buf &buf, args...) { ... if (condition) better_long_jump(buf); // long jumps out! // could also pass buf to another function ... // could also return normally } better_call_with_jmp_buf(function, args); *This* could support altshadowstack just fine. And many users might be okay with the understanding that, if altshadowstack is on, you have to use a better long jump to get out (or a normal sigreturn or _exit). No one is getting an altshadowstack signal handler without code changes. siglongjmp() could support altshadowstack with help from the kernel, but we probably don't want to go there. --Andy
The 06/22/2023 08:26, Andy Lutomirski wrote: > On Thu, Jun 22, 2023 at 2:28 AM szabolcs.nagy@arm.com > <szabolcs.nagy@arm.com> wrote: > > > > The 06/21/2023 18:54, Edgecombe, Rick P wrote: > > > On Wed, 2023-06-21 at 12:36 +0100, szabolcs.nagy@arm.com wrote: > > > > > The 06/20/2023 19:34, Edgecombe, Rick P wrote: > > > > > > > I actually did a POC for this, but rejected it. The problem is, > > > > > > > if > > > > > > > there is a shadow stack overflow at that point then the kernel > > > > > > > > > can't > > > > > > > push the shadow stack token to the old stack. And shadow stack > > > > > > > > > overflow > > > > > > > is exactly the alt shadow stack use case. So it doesn't really > > > > > > > > > solve > > > > > > > the problem. > > > > > > > > > > the restore token in the alt shstk case does not regress anything > > > > > but > > > > > makes some use-cases work. > > > > > > > > > > alt shadow stack is important if code tries to jump in and out of > > > > > signal handlers (dosemu does this with swapcontext) and for that a > > > > > restore token is needed. > > > > > > > > > > alt shadow stack is important if the original shstk did not > > > > > overflow > > > > > but the signal handler would overflow it (small thread stack, huge > > > > > sigaltstack case). > > > > > > > > > > alt shadow stack is also important for crash reporting on shstk > > > > > overflow even if longjmp does not work then. longjmp to a > > > > > makecontext > > > > > stack would still work and longjmp back to the original stack can > > > > > be > > > > > made to mostly work by an altshstk option to overwrite the top > > > > > entry > > > > > with a restore token on overflow (this can break unwinding though). > > > > > > > > > > > There was previously a request to create an alt shadow stack for the > > > purpose of handling shadow stack overflow. So you are now suggesting to > > > to exclude that and instead target a different use case for alt shadow > > > stack? > > > > that is not what i said. > > > > > But I'm not sure how much we should change the ABI at this point since > > > we are constrained by existing userspace. If you read the history, we > > > may end up needing to deprecate the whole elf bit for this and other > > > reasons. > > > > i'm not against deprecating the elf bit, but i think binary > > marking will be difficult for this kind of feature no matter what > > (code may be incompatible for complex runtime dependent reasons). > > > > > So should we struggle to find a way to grow the existing ABI without > > > disturbing the existing userspace? Or should we start with something, > > > finally, and see where we need to grow and maybe get a chance at a > > > fresh start to grow it? > > > > > > Like, maybe 3 people will show up saying "hey, I *really* need to use > > > shadow stack and longjmp from a ucontext stack", and no one says > > > anything about shadow stack overflow. Then we know what to do. And > > > maybe dosemu decides it doesn't need to implement shadow stack (highly > > > likely I would think). Now that I think about it, AFAIU SS_AUTODISARM > > > was created for dosemu, and the alt shadow stack patch adopted this > > > behavior. So it's speculation that there is even a problem in that > > > scenario. > > > > > > Or maybe people just enable WRSS for longjmp() and directly jump back > > > to the setjmp() point. Do most people want fast setjmp/longjmp() at the > > > cost of a little security? > > > > > > Even if, with enough discussion, we could optimize for all > > > hypotheticals without real user feedback, I don't see how it helps > > > users to hold shadow stack. So I think we should move forward with the > > > current ABI. > > > > you may not get a second chance to fix a security feature. > > it will be just disabled if it causes problems. > > *I* would use altshadowstack. > > I run a production system (that cares about correctness *and* > performance, but that's not really relevant here -- SHSTK ought to be > fast). And, if it crashes, I want to know why. So I handle SIGSEGV, > etc so I have good logs if it crashes. And I want those same logs if > I overflow the stack. > > That being said, I have no need for longjmp or siglongjmp for this. I > use exit(2) to escape. the same crash handler that prints a log on shstk overflow should work when a different cause of SIGSEGV is recoverable via longjmp. to me this means that alt shstk must work with longjmp at least in the non-shstk overflow case (which can be declared non-recoverable). > For what it's worth, setjmp/longjmp is a bad API. The actual pattern > that ought to work well (and that could be supported well by fancy > compilers and non-C languages, as I understand it) is more like a > function call that has two ways out. Like this (pseudo-C): > > void function(struct better_jmp_buf &buf, args...) > { > ... > if (condition) > better_long_jump(buf); // long jumps out! > // could also pass buf to another function > ... > // could also return normally > } > > better_call_with_jmp_buf(function, args); > > *This* could support altshadowstack just fine. And many users might > be okay with the understanding that, if altshadowstack is on, you have > to use a better long jump to get out (or a normal sigreturn or _exit). i don't understand why this would work fine when longjmp does not. how does the shstk switch happen? > No one is getting an altshadowstack signal handler without code > changes. assuming the same component is doing the alt shstk setup as the longjmp. > siglongjmp() could support altshadowstack with help from the kernel, > but we probably don't want to go there. what kind of help? maybe we need that help.. e.g. if the signal frame token is detected by longjmp on the shstk then doing an rt_sigreturn with the right signal frame context allows longjmp to continue unwinding the shstk. however kernel sigcontext layout can change so userspace may not know it so longjmp needs a helper, but only in the jump across signal frame case. (this is a different design than what i proposed earlier, it also makes longjmp from alt shstk work without wrss, the downside is that longjmp across makecontext needs a separate solution then which implies that all shstk needs a detectable token at the end of the shstk.. so again something that we have to get right now and cannot add later.)
On Thu, 2023-06-22 at 08:40 +0100, szabolcs.nagy@arm.com wrote: > The 06/21/2023 16:02, H.J. Lu wrote: > > On Wed, Jun 21, 2023 at 11:54 AM Edgecombe, Rick P > > <rick.p.edgecombe@intel.com> wrote: > > > HJ, can you link your patch that makes it extensible and we can > > > clear > > > this up? Maybe the issue extends beyond fnon-call-exceptions, but > > > that > > > is where I reproduced it. > > > > Here is the patch: > > > > https://gitlab.com/x86-gcc/gcc/-/commit/aab4c24b67b5f05b72e52a3eaae005c2277710b9 > > ok i don't see how this is related to fnon-call-exceptions.. I don't know what to tell you. I'm not a compiler expert, but a simple fnon-call-exceptions test was how I reproduced it. If there is some other use case for throwing an exception out of a signal handler, I don't see how it makes fnon-call-exceptions unrelated. > > but it is wrong if the shstk is ever discontinous even though the > shstk format would allow that. this was my original point in this > thread: with current unwinder code you cannot add altshstk later. > and no, introducing new binary markings and rebuild the world just > for altshstk is not OK. so we have to decide now if we want alt > shstk in the future or not, i gave reasons why we would want it. The point was that the old GCCs restrict expanding the shadow stack signal frame, which is a prerequisite to supporting alt shadow stacks. So the existing userspace prevents us from supporting regular alt shadow stack, before we even get to fancy unwinding alt shadow stack. Some kind of ABI opt in will likely be required. Maybe a new elf bit, at which point we can take advantage of what we learned. You previously said: On Wed, 2023-06-21 at 12:36 +0100, szabolcs.nagy@arm.com wrote: > as far as i can tell the current unwinder handles shstk unwinding > correctly across signal handlers (sync or async and > cleanup/exceptions > handlers too), i see no issue with "fixed shadow stack signal frame > size of 8 bytes" other than future extensions and discontinous shstk. I took that to mean that you didn't see how the the existing unwinder prevented alt shadow stacks. Hopefully we're all on the same page now. BTW, when alt shadow stack's were POCed, I hadn't encountered this GCC behavior yet. So it was assumed it could be bolted on later without disturbing anything. If Linus or someone wants to say we're ok with breaking these old GCCs in this way, the first thing I would do would be to pad the shadow stack signal frame with room for alt shadow stack and more. I actually have a patch to do this, but alas we are already pushing it regression wise.
On Thu, 2023-06-22 at 09:27 +0100, szabolcs.nagy@arm.com wrote: [ snip ] > swapcontext is currently *not* supported: for it to work you have to > be able to jump *back* into the signal handler, which does not work > if > the swapcontext target is on the original thread stack (instead of > say a makecontext stack). > > jumping back can only be supported if alt stack can be paired with > an alt shadow stack. > > unwinding across a series of signal interrupts should work even > with discontinous shstk. libgcc does not implement this which is > a problem i think. I would be helpful if you could enumerate the cases you think are important to support. I would like to see how we could support them in the future in some mode. > > > But how does the proposed token placed by the kernel on the > > original > > stack help this problem? The longjmp() would have to be able to > > find > > the location of the restore tokens somehow, which would not > > necessarily > > be near the setjmp() point. The signal token could even be on a > > different shadow stack. > > i posted the exact longjmp code and it takes care of this case. I see how it works for the simple case of longjmp() from an alt shadow stack. I would still prefer a different solution that works with the overflow case. (probably new kernel functionality) But I don't see how it works for unwinding off of a ucontext stack. Or unwinding off of an ucontext stack that was swapped to from an alt shadow stack. > > setjmp does not need to do anything special. > > the invariant is that an shstk is either capped by a restore token > or in use by some executing task. this is guaranteed architecturally > (when shstk is switched with an instruction) and should be guaranteed > by the kernel too (when shstk is switched by the kernel). > > > I'm also not sure leaving a token on signal doesn't weaken the > > security > > it it's own way as well. Any thread could then swap to that token. > > Where as the shadow stack signal frame ssp pointer can only be used > > from the shadow stack the signal was handled on. > > as far as i'm concerned it is a valid programming model to switch > to a stack that is currently not in use and we should always allow > that. (signal handled on an alt stack may not return) Some people just want shadow stack for like, a super stack canary, and want everything to "just work". Some people want to lock things down as much as possible and change their code to do it if needed. It sure is a challenge to figure out where the happy medium is. Ideally there would be several modes so all the users could be happy, but I wouldn't want to start with that for multiple reasons. Although we do have WRSS today, which can support pretty much everything functionality-wise. But in any case, we have limited room for movement. I actually had some other ABI tweaks fully ready to post around the tracing usages, but I just thought given the situation, it was better to start with what we have. This project could really use a time machine... > > > So I think, in addition to blocking the shadow stack overflow use > > case > > in the future, leaving a token behind on signal will not really > > help > > longjmp(). (or at least I'm not following) > > the restore token must only be added if shstk is switched > (currently it is not switched so don't add it, however if > we agree on this then the unwinder can be fixed accordingly.) I do agree that a token should not be added when the stack is not switched, as today. But I don't agree on this solution for alt shadow stacks. Again, I actually built that exact design in actual code, so it's not a NIH thing. It just doesn't work for valid use cases.
On Thu, 2023-06-22 at 17:42 +0100, szabolcs.nagy@arm.com wrote: > the downside is that longjmp across makecontext needs a > separate solution then which implies that all shstk needs > a detectable token at the end of the shstk.. so again > something that we have to get right now and cannot add > later.) This sounds like some scheme to search for a token on another stack, which if so, you haven't elaborated on. I'm not going to be able to contribute on this thread much over the next week, but if you think you know to solve problems which have remained unsolved for years, please spell out the solutions. I'd also appreciate if you could spell out exactly which: - ucontext - signal - longjmp - custom library stack switching patterns you think shadow stack should support working together. Because even after all these mails, I'm still not sure exactly what you are trying to achieve.
The 06/22/2023 16:47, Edgecombe, Rick P wrote: > On Thu, 2023-06-22 at 09:27 +0100, szabolcs.nagy@arm.com wrote: > > > [ snip ] > > > swapcontext is currently *not* supported: for it to work you have to > > be able to jump *back* into the signal handler, which does not work > > if > > the swapcontext target is on the original thread stack (instead of > > say a makecontext stack). > > > > jumping back can only be supported if alt stack can be paired with > > an alt shadow stack. > > > > unwinding across a series of signal interrupts should work even > > with discontinous shstk. libgcc does not implement this which is > > a problem i think. > > I would be helpful if you could enumerate the cases you think are > important to support. I would like to see how we could support them in > the future in some mode. > > > > > > But how does the proposed token placed by the kernel on the > > > original > > > stack help this problem? The longjmp() would have to be able to > > > find > > > the location of the restore tokens somehow, which would not > > > necessarily > > > be near the setjmp() point. The signal token could even be on a > > > different shadow stack. > > > > i posted the exact longjmp code and it takes care of this case. > > I see how it works for the simple case of longjmp() from an alt shadow > stack. I would still prefer a different solution that works with the > overflow case. (probably new kernel functionality) > > But I don't see how it works for unwinding off of a ucontext stack. Or > unwinding off of an ucontext stack that was swapped to from an alt > shadow stack. why? a stack can be active or inactive (task executing on it or not), and if inactive it can be live or dead (has stack frames that can be jumped to or not). this is independent of shadow stacks: longjmp is only valid if the target is either the same active stack or an inactive live stack. (there are cases that may seem to work, but fundamentally broken and not supportable: e.g. two tasks executing on the same stack where the one above happens to not clobber frames deep enough to collide with the task below.) the proposed longjmp design works for both cases. no assumption is made about ucontext or signals other than the shadow stack for an inactive live stack ends in a restore token, which is guaranteed by the isa so we only need the kernel to do the same when it switches shadow stacks. then longjmp works by construction. the only wart is that an overflowed shadow stack is inactive dead instead of inactive live because the token cannot be added. (note that handling shstk overflow and avoiding shstk overflow during signal handling still works with alt shstk!) an alternative solution is to allow jump to inactive dead stack if that's created by a signal interrupt. for that a syscall is needed and longjmp has to detect if the target stack is dead or live. (the kernel also has to be able to tell if switching to the dead stack is valid for security reasons.) i don't know if this is doable (if we allow some hacks it's doable). unwinding across signal handlers is just a matter of having enough information at the signal frame to continue, it does not have to follow crazy jumps or weird coroutine things: that does not work without shadow stacks either. but unwind across alt stack frame should work.
On Fri, 2023-06-23 at 17:25 +0100, szabolcs.nagy@arm.com wrote: > why? > > a stack can be active or inactive (task executing on it or not), > and if inactive it can be live or dead (has stack frames that can > be jumped to or not). > > this is independent of shadow stacks: longjmp is only valid if the > target is either the same active stack or an inactive live stack. > (there are cases that may seem to work, but fundamentally broken > and not supportable: e.g. two tasks executing on the same stack > where the one above happens to not clobber frames deep enough to > collide with the task below.) > > the proposed longjmp design works for both cases. no assumption is > made about ucontext or signals other than the shadow stack for an > inactive live stack ends in a restore token, One of the problems for the case of longjmp() from another another stack, is how to find the old stack's token. HJ and I had previously discussed searching for the token from the target SSP forward, but the problems are it is 1, not gauranteed to be there and 2, pretty awkward and potentially slow. > which is guaranteed by > the isa so we only need the kernel to do the same when it switches > shadow stacks. then longjmp works by construction. No it's not. > > the only wart is that an overflowed shadow stack is inactive dead > instead of inactive live because the token cannot be added. (note > that handling shstk overflow and avoiding shstk overflow during > signal handling still works with alt shstk!) I thought we were on the same page as far as pushing a restore token on signal not being robust against shadow stack overflow, so is this a new idea? Usually people around here say "code talks", but all I'm asking for is a full explanation of what you are trying to accomplish and what the idea is. Otherwise this is asking to hold up this feature based on hand waving. Could you answer the questions here, along with a full description of your proposal: https://lore.kernel.org/all/1cd67ae45fc379fd82d2745190e4caf74e67499e.camel@intel.com/ It we are talking past each other, it could help to do a levelset at this point. > > an alternative solution is to allow jump to inactive dead stack > if that's created by a signal interrupt. for that a syscall is > needed and longjmp has to detect if the target stack is dead or > live. (the kernel also has to be able to tell if switching to the > dead stack is valid for security reasons.) i don't know if this > is doable (if we allow some hacks it's doable). > > unwinding across signal handlers is just a matter of having > enough information at the signal frame to continue, it does > not have to follow crazy jumps or weird coroutine things: > that does not work without shadow stacks either. but unwind > across alt stack frame should work. Even if there is some workable idea, there is already a bunch of userspace built around the existing solution, and users waiting for it. In addition the whole discussion around alt shadow stack cases will require alt shadow stacks to be implemented and we might be constrained there anyway. If we want to take learnings and do something new, let's build it around a new elf bit. This current kernel ABI is to support the old elf bit userspace stack, which has been solidified by the existing upstream userspace. I had thought we should start from scratch now and proposed a patch to block the old elf bit to force this, but lost that battle with other glibc developers. Speaking of which, I don't see any enthusiasm from any other glibc developers that have been involved in this previously. Who were you thinking was going to implement any of this on the glibc side? Have you made any progress in getting any of them onboard with this order of operations in the months since you first brought up changing the design?
On Thu, Jun 22, 2023 at 9:43 AM szabolcs.nagy@arm.com <szabolcs.nagy@arm.com> wrote: > > The 06/22/2023 08:26, Andy Lutomirski wrote: > > On Thu, Jun 22, 2023 at 2:28 AM szabolcs.nagy@arm.com > > <szabolcs.nagy@arm.com> wrote: > > > > > > The 06/21/2023 18:54, Edgecombe, Rick P wrote: > > > > On Wed, 2023-06-21 at 12:36 +0100, szabolcs.nagy@arm.com wrote: > > > > > > The 06/20/2023 19:34, Edgecombe, Rick P wrote: > > > > > > > > I actually did a POC for this, but rejected it. The problem is, > > > > > > > > if > > > > > > > > there is a shadow stack overflow at that point then the kernel > > > > > > > > > > can't > > > > > > > > push the shadow stack token to the old stack. And shadow stack > > > > > > > > > > overflow > > > > > > > > is exactly the alt shadow stack use case. So it doesn't really > > > > > > > > > > solve > > > > > > > > the problem. > > > > > > > > > > > > the restore token in the alt shstk case does not regress anything > > > > > > but > > > > > > makes some use-cases work. > > > > > > > > > > > > alt shadow stack is important if code tries to jump in and out of > > > > > > signal handlers (dosemu does this with swapcontext) and for that a > > > > > > restore token is needed. > > > > > > > > > > > > alt shadow stack is important if the original shstk did not > > > > > > overflow > > > > > > but the signal handler would overflow it (small thread stack, huge > > > > > > sigaltstack case). > > > > > > > > > > > > alt shadow stack is also important for crash reporting on shstk > > > > > > overflow even if longjmp does not work then. longjmp to a > > > > > > makecontext > > > > > > stack would still work and longjmp back to the original stack can > > > > > > be > > > > > > made to mostly work by an altshstk option to overwrite the top > > > > > > entry > > > > > > with a restore token on overflow (this can break unwinding though). > > > > > > > > > > > > > > There was previously a request to create an alt shadow stack for the > > > > purpose of handling shadow stack overflow. So you are now suggesting to > > > > to exclude that and instead target a different use case for alt shadow > > > > stack? > > > > > > that is not what i said. > > > > > > > But I'm not sure how much we should change the ABI at this point since > > > > we are constrained by existing userspace. If you read the history, we > > > > may end up needing to deprecate the whole elf bit for this and other > > > > reasons. > > > > > > i'm not against deprecating the elf bit, but i think binary > > > marking will be difficult for this kind of feature no matter what > > > (code may be incompatible for complex runtime dependent reasons). > > > > > > > So should we struggle to find a way to grow the existing ABI without > > > > disturbing the existing userspace? Or should we start with something, > > > > finally, and see where we need to grow and maybe get a chance at a > > > > fresh start to grow it? > > > > > > > > Like, maybe 3 people will show up saying "hey, I *really* need to use > > > > shadow stack and longjmp from a ucontext stack", and no one says > > > > anything about shadow stack overflow. Then we know what to do. And > > > > maybe dosemu decides it doesn't need to implement shadow stack (highly > > > > likely I would think). Now that I think about it, AFAIU SS_AUTODISARM > > > > was created for dosemu, and the alt shadow stack patch adopted this > > > > behavior. So it's speculation that there is even a problem in that > > > > scenario. > > > > > > > > Or maybe people just enable WRSS for longjmp() and directly jump back > > > > to the setjmp() point. Do most people want fast setjmp/longjmp() at the > > > > cost of a little security? > > > > > > > > Even if, with enough discussion, we could optimize for all > > > > hypotheticals without real user feedback, I don't see how it helps > > > > users to hold shadow stack. So I think we should move forward with the > > > > current ABI. > > > > > > you may not get a second chance to fix a security feature. > > > it will be just disabled if it causes problems. > > > > *I* would use altshadowstack. > > > > I run a production system (that cares about correctness *and* > > performance, but that's not really relevant here -- SHSTK ought to be > > fast). And, if it crashes, I want to know why. So I handle SIGSEGV, > > etc so I have good logs if it crashes. And I want those same logs if > > I overflow the stack. > > > > That being said, I have no need for longjmp or siglongjmp for this. I > > use exit(2) to escape. > > the same crash handler that prints a log on shstk overflow should > work when a different cause of SIGSEGV is recoverable via longjmp. > to me this means that alt shstk must work with longjmp at least in > the non-shstk overflow case (which can be declared non-recoverable). Sure, but how many SIGSEGV handlers would use altshadowstack and *also, in the same handler* ever resume? Not mine. Obviously I'm only one sample. > > > For what it's worth, setjmp/longjmp is a bad API. The actual pattern > > that ought to work well (and that could be supported well by fancy > > compilers and non-C languages, as I understand it) is more like a > > function call that has two ways out. Like this (pseudo-C): > > > > void function(struct better_jmp_buf &buf, args...) > > { > > ... > > if (condition) > > better_long_jump(buf); // long jumps out! > > // could also pass buf to another function > > ... > > // could also return normally > > } > > > > better_call_with_jmp_buf(function, args); > > > > *This* could support altshadowstack just fine. And many users might > > be okay with the understanding that, if altshadowstack is on, you have > > to use a better long jump to get out (or a normal sigreturn or _exit). > > i don't understand why this would work fine when longjmp does not. > how does the shstk switch happen? Ugh, I think this may have some issues given how the ISA works. Sigh. I was imagining that better_call_with_jmp_buf would push a restore token on the shadow stack, then call the passed-in function, then, on a successful return, INCSSP over the token and continue on. better_long_jump() would RSTORSSP to the saved token. But I'm not sure how to write the token without WRUSS. What *could* be done, which would be nasty and sigaltshadowstack-specific, is to have a jump out of a signal handler provide a pointer to the signal frame (siginfo_t or ucontext pointer), and the kernel would assist it in switching the shadow stack back. Eww. --Andy > > > No one is getting an altshadowstack signal handler without code > > changes. > > assuming the same component is doing the alt shstk setup as the > longjmp. > > > siglongjmp() could support altshadowstack with help from the kernel, > > but we probably don't want to go there. > > what kind of help? maybe we need that help.. > > e.g. if the signal frame token is detected by longjmp on > the shstk then doing an rt_sigreturn with the right signal > frame context allows longjmp to continue unwinding the shstk. > however kernel sigcontext layout can change so userspace may > not know it so longjmp needs a helper, but only in the jump > across signal frame case. > > (this is a different design than what i proposed earlier, > it also makes longjmp from alt shstk work without wrss, > the downside is that longjmp across makecontext needs a > separate solution then which implies that all shstk needs > a detectable token at the end of the shstk.. so again > something that we have to get right now and cannot add > later.)
The 06/22/2023 16:46, Edgecombe, Rick P wrote: > You previously said: > > On Wed, 2023-06-21 at 12:36 +0100, szabolcs.nagy@arm.com wrote: > > as far as i can tell the current unwinder handles shstk unwinding > > correctly across signal handlers (sync or async and > > cleanup/exceptions > > handlers too), i see no issue with "fixed shadow stack signal frame > > size of 8 bytes" other than future extensions and discontinous shstk. > > I took that to mean that you didn't see how the the existing unwinder > prevented alt shadow stacks. Hopefully we're all on the same page now. well alt shstk is discontinous. there were two separate confusions: - your mention of fnon-call-exception threw me off since that is a very specific corner case. - i was talking about an unwind design that can deal with altshstk which requires ssp switch. (current uwinder does not support this, but i assumed we can add that now and ignore old broken unwinders). you are saying that alt shstk support needs additional shstk tokens in the signal frame to maintain alt shstk state for the kernel. i think now we are on the same page. > BTW, when alt shadow stack's were POCed, I hadn't encountered this GCC > behavior yet. So it was assumed it could be bolted on later without > disturbing anything. If Linus or someone wants to say we're ok with > breaking these old GCCs in this way, the first thing I would do would > be to pad the shadow stack signal frame with room for alt shadow stack > and more. I actually have a patch to do this, but alas we are already > pushing it regression wise. sounds like it will be hard to add alt shstk later. (i think maintaining alt shstk state on the stack instead of shstk should work too. but if that does not work, then alt shstk will require another abi opt-in.)
On Mon, 2023-06-26 at 15:08 +0100, szabolcs.nagy@arm.com wrote: > The 06/22/2023 16:46, Edgecombe, Rick P wrote: > > You previously said: > > > > On Wed, 2023-06-21 at 12:36 +0100, szabolcs.nagy@arm.com wrote: > > > as far as i can tell the current unwinder handles shstk unwinding > > > correctly across signal handlers (sync or async and > > > cleanup/exceptions > > > handlers too), i see no issue with "fixed shadow stack signal > > > frame > > > size of 8 bytes" other than future extensions and discontinous > > > shstk. > > > > I took that to mean that you didn't see how the the existing > > unwinder > > prevented alt shadow stacks. Hopefully we're all on the same page > > now. > > well alt shstk is discontinous. > > there were two separate confusions: > > - your mention of fnon-call-exception threw me off since that is a > very specific corner case. > > - i was talking about an unwind design that can deal with altshstk > which requires ssp switch. (current uwinder does not support this, > but i assumed we can add that now and ignore old broken unwinders). > you are saying that alt shstk support needs additional shstk tokens > in the signal frame to maintain alt shstk state for the kernel. > > i think now we are on the same page. I don't think I fully understand your point still. It would really help if you could do a levelset summary, like I asked on other branches of this thread. > > > BTW, when alt shadow stack's were POCed, I hadn't encountered this > > GCC > > behavior yet. So it was assumed it could be bolted on later without > > disturbing anything. If Linus or someone wants to say we're ok with > > breaking these old GCCs in this way, the first thing I would do > > would > > be to pad the shadow stack signal frame with room for alt shadow > > stack > > and more. I actually have a patch to do this, but alas we are > > already > > pushing it regression wise. > > sounds like it will be hard to add alt shstk later. Adding alt shadow stack without moving the elf bit runs the risk of regressions because of the old GCCs. But moving the elf bit is easy (from the technical side at least). There were several threads about this in the past. So I don't think any harder than it is now. > > (i think maintaining alt shstk state on the stack instead of > shstk should work too. but if that does not work, then alt > shstk will require another abi opt-in.) The x86 sigframe is pretty full AFAIK. There are already features that require an opt in. It might be true that the alt shadow stack base and size don't need to be protected like the previous SSP. I'd have to think about it. It could be nice to optionally put some IBT stuff on the shadow stack as well, which would require growing the frame, and at that point the alt shadow stack stuff might as well go there too.
The 06/22/2023 23:18, Edgecombe, Rick P wrote: > I'd also appreciate if you could spell out exactly which: > - ucontext > - signal > - longjmp > - custom library stack switching > > patterns you think shadow stack should support working together. > Because even after all these mails, I'm still not sure exactly what you > are trying to achieve. i'm trying to support two operations (in any combination): (1) jump up the current (active) stack. (2) jump to a live frame in a different inactive but live stack. the old stack becomes inactive (= no task executes on it) and live (= has valid frames to jump to). with (3) the runtime must manage the shadow stacks transparently. (= portable c code does not need modifications) mapping this to c apis: - swapcontext, setcontext, longjmp, custom stack switching are jump operations. (there are conditions under which (1) and (2) must work, further details don't matter.) - makecontext creates an inactive live stack. - signal is only special if it executes on an alt stack: on signal entry the alt stack becomes active and the interrupted stack inactive but live. (nested signals execute on the alt stack until that is left either via a jump or signal return.) - unwinding can be implemented with jump operations (it needs some other things but that's out of scope here). the patterns that shadow stack should support falls out of this model. (e.g. posix does not allow jumping from one thread to the stack of a different thread, but the model does not care about that, it only cares if the target stack is inactive and live then jump should work.) some observations: - it is necessary for jump to detect case (2) and then switch to the target shadow stack. this is also sufficient to implement it. (note: the restore token can be used for detection since that is guaranteed to be present when user code creates an inactive live stack and is not present anywhere else by design. a different marking can be used if the inactive live stack is created by the kernel, but then the kernel has to provide a switch method, e.g. syscall. this should not be controversial.) - in this model two live stacks cannot use the same shadow stack since jumping between the two stacks is allowed in both directions, but jumping within a shadow stack only works in one direction. (also two tasks could execute on the same shadow stack then. and it makes shadow stack size accounting problematic.) - so sharing shadow stack with alt stack is broken. (the model is right in the sense that valid posix code can trigger the issue. we can ignore that corner case and adjust the model so the shared shadow stack works for alt stack, but it likely does not change the jump design: eventually we want alt shadow stack.) - shadow stack cannot always be managed by the runtime transparently: it has to be allocated for makecontext and alt stack in situations where allocation failure cannot be handled. more alarmingly the destruction of stacks may not be visible to the runtime so the corresponding shadow stacks leak. my preferred way to fix this is new apis that are shadow stack compatible (e.g. shadow_makecontext with shadow_freecontext) and marking the incompatible apis as such. portable code then can decide to update to new apis, run with shstk disabled or accept the leaks and OOM failures. the current approach needs ifdef __CET__ in user code for makecontext and sigaltstack has many issues. - i'm still not happy with the shadow stack sizing. and would like to have a token at the end of the shadow stack to allow scanning. and it would be nice to deal with shadow stack overflow. and there is async disable on dlopen. so there are things to work on. i understand that the proposed linux abi makes most existing binaries with shstk marking work, which is relevant for x86. for a while i thought we can fix the remaining issues even if that means breaking existing shstk binaries (just bump the abi marking). now it seems the issues can only be addressed in a future abi break. which means x86 linux will likely end up maintaining two incompatible abis and the future one will need user code and build system changes, not just runtime changes. it is not a small incremental change to add alt shadow stack support for example. i don't think the maintenance burden of two shadow stack abis is the right path for arm64 to follow, so the shadow stack semantics will likely become divergent not common across targets. i hope my position is now clearer.
On Thu, 2023-06-29 at 17:07 +0100, szabolcs.nagy@arm.com wrote: > The 06/22/2023 23:18, Edgecombe, Rick P wrote: > > I'd also appreciate if you could spell out exactly which: > > - ucontext > > - signal > > - longjmp > > - custom library stack switching > > > > patterns you think shadow stack should support working together. > > Because even after all these mails, I'm still not sure exactly what > > you > > are trying to achieve. Hi Szablocs, Thanks for writing all this up. It is helpful to understand where you are coming from. Please don't miss my point at the very bottom of this response. > > i'm trying to support two operations (in any combination): > > (1) jump up the current (active) stack. > > (2) jump to a live frame in a different inactive but live stack. > the old stack becomes inactive (= no task executes on it) > and live (= has valid frames to jump to). > > with > > (3) the runtime must manage the shadow stacks transparently. > (= portable c code does not need modifications) > > mapping this to c apis: > > - swapcontext, setcontext, longjmp, custom stack switching are jump > operations. (there are conditions under which (1) and (2) must > work, > further details don't matter.) > > - makecontext creates an inactive live stack. > > - signal is only special if it executes on an alt stack: on signal > entry the alt stack becomes active and the interrupted stack > inactive but live. (nested signals execute on the alt stack until > that is left either via a jump or signal return.) > > - unwinding can be implemented with jump operations (it needs some > other things but that's out of scope here). > > the patterns that shadow stack should support falls out of this > model. > (e.g. posix does not allow jumping from one thread to the stack of a > different thread, but the model does not care about that, it only > cares if the target stack is inactive and live then jump should > work.) > > some observations: > > - it is necessary for jump to detect case (2) and then switch to the > target shadow stack. this is also sufficient to implement it. > (note: > the restore token can be used for detection since that is > guaranteed > to be present when user code creates an inactive live stack and is > not present anywhere else by design. a different marking can be > used > if the inactive live stack is created by the kernel, but then the > kernel has to provide a switch method, e.g. syscall. this should > not > be controversial.) For x86's shadow stack you can jump to a new stack without leaving a token behind. I don't know if maybe we could make it a rule in the x86_64 ABI that you should always leave a token if you are going to mark the SHSTK elf bit. But if anything did this, then longjmp() could never make it back to the stack where setjmp() was called without kernel help. > > - in this model two live stacks cannot use the same shadow stack > since > jumping between the two stacks is allowed in both directions, but > jumping within a shadow stack only works in one direction. (also > two > tasks could execute on the same shadow stack then. and it makes > shadow stack size accounting problematic.) > > - so sharing shadow stack with alt stack is broken. (the model is > right in the sense that valid posix code can trigger the issue. Could you spell out what "the issue" is that can be triggered? > we > can ignore that corner case and adjust the model so the shared > shadow stack works for alt stack, but it likely does not change the > jump design: eventually we want alt shadow stack.) As we discussed previously, alt shadow stack can't work transparently with existing code due to the sigaltstack API. I wonder if maybe you are trying to get at something else, and I'm not following. > > - shadow stack cannot always be managed by the runtime transparently: > it has to be allocated for makecontext and alt stack in situations > where allocation failure cannot be handled. more alarmingly the > destruction of stacks may not be visible to the runtime so the > corresponding shadow stacks leak. my preferred way to fix this is > new apis that are shadow stack compatible (e.g. shadow_makecontext > with shadow_freecontext) and marking the incompatible apis as such. > portable code then can decide to update to new apis, run with shstk > disabled or accept the leaks and OOM failures. the current approach > needs ifdef __CET__ in user code for makecontext and sigaltstack > has many issues. This sounds reasonable to me on the face of it. It seems mostly unrelated to the kernel ABI and purely a userspace thing. > > - i'm still not happy with the shadow stack sizing. and would like to > have a token at the end of the shadow stack to allow scanning. and > it would be nice to deal with shadow stack overflow. and there is > async disable on dlopen. so there are things to work on. I was imagining that for tracing-only users, it might make sense to run with WRSS enabled. This could mean libc's could write their own restore tokens. In the case of longjmp() it could be simple and fast. The implementation could just write a token at the target SSP and switch to it. Non C runtimes that want to use if for backtracing could also write their own preferred stack markers or other data. It also is whole different solution to what is being discussed. But over the course of this thread, I could imagine a little more now how a top of stack marker could possibly be useful for non-tracing usages. I have a patch prepared for this and I had tested to see if adding this later could disturb anything in userspace. The only thing that I found was that gdb might output a slightly different stack trace. So it would be a user visible change, if not a regression. One reason I held off on it still, is that the plan for the expanded shadow stack signal frame includes using a 0 frame, to avoid a forgery scenario. The token that makes sense for the end of stack marker is also a 0 frame. So if userspace that looks for the end of stack marker scans for the 0 frame without checking if it is part of an expanded shadow stack signal frame, then it could make more trouble for alt shadow stack. So since they are tied together, and I thought to hold off on it for now. I don't want to try to squeeze around the upstream userspace, I think a version 2 should be a clean slate on a new elf bit. > > i understand that the proposed linux abi makes most existing binaries > with shstk marking work, which is relevant for x86. > > for a while i thought we can fix the remaining issues even if that > means breaking existing shstk binaries (just bump the abi marking). > now it seems the issues can only be addressed in a future abi break. Adding a new arch_prctl() ENABLE value was the plan. Not sure what you mean by ABI break vs version bump. The plan was to add the new features without userspace regression by putting any behavior behind a different enable option. This relies on userspace to add a new elf bit, and to use it. > > which means x86 linux will likely end up maintaining two incompatible > abis and the future one will need user code and build system changes, > not just runtime changes. it is not a small incremental change to add > alt shadow stack support for example. > > i don't think the maintenance burden of two shadow stack abis is the > right path for arm64 to follow, so the shadow stack semantics will > likely become divergent not common across targets. Unfortunately we are at a bit of an information asymmetry here because the ARM spec and patches are not public. It may be part of the cause of the confusion. > > i hope my position is now clearer. It kind of sounds like you don't like the x86 glibc implementation. And you want to make sure the kernel can support whatever a new solution is that you are working on. I am on board with the goal of having some generic set of rules to make portable code work for other architectures shadow stacks. But I think how close we can get to that goal or what it looks like is an open question. For several reasons: 1. Not everyone can see all the specs 2. No POCs have been done (or at least shared) 3. It's not clear what needs to be supported (yes, I know you have made a rough proposal here, but it sounds like on the x86 glibc side at least it's not even clear what non-shadow stack stack switching operations can work together) But towards these goals, I think your technical requests are: 1. Leave a token on switching to an alt shadow stack. As discussed earlier, we can't do this because of the overflow issues. Also since, alt shadow stack cannot be transparent to existing software anyway, it should be ok to introduce limitations. So I think this one is a no. What we could do is introduce security weakening kernel helpers, but this would make sense to come with alt shadow stack support. 2. Add an end token at the top of the shadow stack. Because of the existing userspace restriction interactions, this is complicated to evaluate but I think we *could* do this now. There are pros and cons. 3. Support more options for shadow stack sizing. (I think you are referring to this conversation: https://lore.kernel.org/lkml/ZAIgrXQ4670gxlE4@arm.com/). I don't see why this is needed for the base implementation. If ARM wants to add a new rlimit or clone variant, I don't see why x86 can't support it later. So if we add 2, are you satisfied? Or otherwise, on the non-technical request side, are you asking to hold off on x86 shadow stack, in order to co-develop a unified solution? I think the existing solution will see use in the meantime, including for the development of all the x86 specific JIT implementations. And finally, what I think is the most important point in all of this: I think that *how* it gets used will be a better guide for further development than us debating. For example the main pain point that has come up so far is the problems around dlopen(). And the future work that has been scoped has been for the kernel to help out in this area. This is based on _user_ (distro) requests. Any apps that don't work with shadow stack limitations can simply not enable shadow stack. You and me are debating these specific API combinations, but we can't know whether they are actually the best place to focus development efforts. And the early signs are this is NOT the most important problem to solve.
On Sun, Jul 02, 2023 at 06:03:42PM +0000, Edgecombe, Rick P wrote: > On Thu, 2023-06-29 at 17:07 +0100, szabolcs.nagy@arm.com wrote: > > which means x86 linux will likely end up maintaining two incompatible > > abis and the future one will need user code and build system changes, > > not just runtime changes. it is not a small incremental change to add > > alt shadow stack support for example. > > i don't think the maintenance burden of two shadow stack abis is the > > right path for arm64 to follow, so the shadow stack semantics will > > likely become divergent not common across targets. > Unfortunately we are at a bit of an information asymmetry here because > the ARM spec and patches are not public. It may be part of the cause of > the confusion. While the descriptive text bit of the spec is not yet integrated into the ARM the architecture XML describing the instructions and system registers is there, the document is numbered DDI0601: https://developer.arm.com/documentation/ddi0601/ The GCS specific instructions and system registers are all named beginning with GCS, it's aarch64 only. Hopefully I should have something out next week for the kernel.
The 07/02/2023 18:03, Edgecombe, Rick P wrote: > On Thu, 2023-06-29 at 17:07 +0100, szabolcs.nagy@arm.com wrote: > > The 06/22/2023 23:18, Edgecombe, Rick P wrote: > > > I'd also appreciate if you could spell out exactly which: > > > - ucontext > > > - signal > > > - longjmp > > > - custom library stack switching > > > > > > patterns you think shadow stack should support working together. > > > Because even after all these mails, I'm still not sure exactly what > > > you > > > are trying to achieve. > > Hi Szablocs, > > Thanks for writing all this up. It is helpful to understand where you > are coming from. Please don't miss my point at the very bottom of this > response. > > > > > i'm trying to support two operations (in any combination): > > > > (1) jump up the current (active) stack. > > > > (2) jump to a live frame in a different inactive but live stack. > > the old stack becomes inactive (= no task executes on it) > > and live (= has valid frames to jump to). > > > > with > > > > (3) the runtime must manage the shadow stacks transparently. > > (= portable c code does not need modifications) > > > > mapping this to c apis: > > > > - swapcontext, setcontext, longjmp, custom stack switching are jump > > operations. (there are conditions under which (1) and (2) must > > work, > > further details don't matter.) > > > > - makecontext creates an inactive live stack. > > > > - signal is only special if it executes on an alt stack: on signal > > entry the alt stack becomes active and the interrupted stack > > inactive but live. (nested signals execute on the alt stack until > > that is left either via a jump or signal return.) > > > > - unwinding can be implemented with jump operations (it needs some > > other things but that's out of scope here). > > > > the patterns that shadow stack should support falls out of this > > model. > > (e.g. posix does not allow jumping from one thread to the stack of a > > different thread, but the model does not care about that, it only > > cares if the target stack is inactive and live then jump should > > work.) > > > > some observations: > > > > - it is necessary for jump to detect case (2) and then switch to the > > target shadow stack. this is also sufficient to implement it. > > (note: > > the restore token can be used for detection since that is > > guaranteed > > to be present when user code creates an inactive live stack and is > > not present anywhere else by design. a different marking can be > > used > > if the inactive live stack is created by the kernel, but then the > > kernel has to provide a switch method, e.g. syscall. this should > > not > > be controversial.) > > For x86's shadow stack you can jump to a new stack without leaving a > token behind. I don't know if maybe we could make it a rule in the > x86_64 ABI that you should always leave a token if you are going to > mark the SHSTK elf bit. But if anything did this, then longjmp() could > never make it back to the stack where setjmp() was called without > kernel help. ok. i didn't know that. the restore token means the stack is valid to execute on later. so from libc pov if the token is missing then jumping to that stack is simply ub. (the kernel could help working around missing tokens but presumably the point of not adding a token is exactly to prevent any kind of jumps to that stack later). so i don't think this changes much. (other than the x86 model has inactive dead stacks that cannot be jumped to, and this is enforced. but libc jumps are not supposed to leave such dead stacks behind so libc jumps would have to add the restore token when doing a switch.) > > - in this model two live stacks cannot use the same shadow stack > > since > > jumping between the two stacks is allowed in both directions, but > > jumping within a shadow stack only works in one direction. (also > > two > > tasks could execute on the same shadow stack then. and it makes > > shadow stack size accounting problematic.) > > > > - so sharing shadow stack with alt stack is broken. (the model is > > right in the sense that valid posix code can trigger the issue. > > Could you spell out what "the issue" is that can be triggered? i meant jumping back from the main to the alt stack: in main: setup sig alt stack setjmp buf1 raise signal on first return longjmp buf2 on second return in signal handler: setjmp buf2 longjmp buf1 on first return can continue after second return in my reading of posix this is valid (and works if signals are masked such that the alt stack is not clobbered when jumping away from it). but cannot work with a single shared shadow stack. > > we > > can ignore that corner case and adjust the model so the shared > > shadow stack works for alt stack, but it likely does not change the > > jump design: eventually we want alt shadow stack.) > > As we discussed previously, alt shadow stack can't work transparently > with existing code due to the sigaltstack API. I wonder if maybe you > are trying to get at something else, and I'm not following. i would like a jump design that works with alt shadow stack. ... > So since they are tied together, and I thought to hold off on it for > now. I don't want to try to squeeze around the upstream userspace, I > think a version 2 should be a clean slate on a new elf bit. ok. > > i understand that the proposed linux abi makes most existing binaries > > with shstk marking work, which is relevant for x86. > > > > for a while i thought we can fix the remaining issues even if that > > means breaking existing shstk binaries (just bump the abi marking). > > now it seems the issues can only be addressed in a future abi break. > > Adding a new arch_prctl() ENABLE value was the plan. Not sure what you > mean by ABI break vs version bump. The plan was to add the new features > without userspace regression by putting any behavior behind a different > enable option. This relies on userspace to add a new elf bit, and to > use it. yes future abi break was the wrong wording: new abi with new elf bit is what i meant. i.e. x86 will have an abi v1 and abi v2, where v2 abi is not compatible e.g. with v1 unwinder. > > which means x86 linux will likely end up maintaining two incompatible > > abis and the future one will need user code and build system changes, > > not just runtime changes. it is not a small incremental change to add > > alt shadow stack support for example. > > > > i don't think the maintenance burden of two shadow stack abis is the > > right path for arm64 to follow, so the shadow stack semantics will > > likely become divergent not common across targets. > > Unfortunately we are at a bit of an information asymmetry here because > the ARM spec and patches are not public. It may be part of the cause of > the confusion. yes that's unfortunate. but in this case i just meant that arm64 does not have existing marked binaries to worry about. so it seems wrong to do a v1 abi and later a v2 abi to fix that up. > It kind of sounds like you don't like the x86 glibc implementation. And > you want to make sure the kernel can support whatever a new solution is > that you are working on. I am on board with the goal of having some > generic set of rules to make portable code work for other architectures > shadow stacks. But I think how close we can get to that goal or what it > looks like is an open question. For several reasons: > 1. Not everyone can see all the specs > 2. No POCs have been done (or at least shared) > 3. It's not clear what needs to be supported (yes, I know you have > made a rough proposal here, but it sounds like on the x86 glibc > side at least it's not even clear what non-shadow stack stack > switching operations can work together) > > But towards these goals, I think your technical requests are: > > 1. Leave a token on switching to an alt shadow stack. As discussed > earlier, we can't do this because of the overflow issues. Also since, i don't think the overflow discussion was conclusive. the kernel could modify the top entry instead of adding a new token. and provide a syscall to switch to that top entry undoing the modification. there might be cornercases and likely not much space for encoding a return address and special marking in an entry. but otherwise this makes jumping to an overflowed shadow stack work. but it is also a valid design to just not support jumping out of alt stack in the specific case of a shadow stack overflow (treat that as a fatal error), but still allow the jump in less fatal situations. > alt shadow stack cannot be transparent to existing software anyway, it maybe not in glibc, but a libc can internally use alt shadow stack in sigaltstack instead of exposing a separate sigaltshadowstack api. (this is what a strict posix conform implementation has to do to support shadow stacks), leaking shadow stacks is not a correctness issue unless it prevents the program working (the shadow stack for the main thread likely wastes more memory than all the alt stack leaks. if the leaks become dominant in a thread the sigaltstack libc api can just fail). > should be ok to introduce limitations. So I think this one is a no. > What we could do is introduce security weakening kernel helpers, but > this would make sense to come with alt shadow stack support. yes this would be for abi v2 on x86. > 2. Add an end token at the top of the shadow stack. Because of the > existing userspace restriction interactions, this is complicated to > evaluate but I think we *could* do this now. There are pros and cons. yes, this is a minor point. (and ok to do differently across targets) > 3. Support more options for shadow stack sizing. (I think you are > referring to this conversation: > https://lore.kernel.org/lkml/ZAIgrXQ4670gxlE4@arm.com/). I don't see > why this is needed for the base implementation. If ARM wants to add a > new rlimit or clone variant, I don't see why x86 can't support it > later. i think it can be added later. but it may be important for deployment on some platforms, since a libc (or other language runtime) may want to set the shadow stack size differently than the kernel default, because - languages allocating large arrays on the stack (too big shadow stack can cause OOM with overcommit off and rlimits can be hit like RLIMIT_DATA, RLIMIT_AS because of it) - tiny thread stack but big sigaltstack (musl libc, go). > So if we add 2, are you satisfied? Or otherwise, on the non-technical > request side, are you asking to hold off on x86 shadow stack, in order > to co-develop a unified solution? well a unified solution would need a v2 abi with new elf bit, and it seems the preference is a v1 abi first, so at this point i'm just trying to understand the potential brokenness in v1 and possible solutions for v2 since if there is a v2 i would like that to be compatible across targets. > I think the existing solution will see use in the meantime, including > for the development of all the x86 specific JIT implementations. i think some of that work have to be redone if there is a v2 abi, which is why i thought having 2 abis is too much maintenance work. > And finally, what I think is the most important point in all of this: > > I think that *how* it gets used will be a better guide for further > development than us debating. For example the main pain point that has > come up so far is the problems around dlopen(). And the future work > that has been scoped has been for the kernel to help out in this area. > This is based on _user_ (distro) requests. i think dlopen (and how it is used) is part of the api/abi design. in presence of programs that can load any library (e.g. python exe) it is difficult to make use of shstk without runtime disable. however if dlopen gets support for runtime disable, reducing the number of incompatible libraries over time is still relevant, which requires abi/api design that allows that. > Any apps that don't work with shadow stack limitations can simply not > enable shadow stack. You and me are debating these specific API > combinations, but we can't know whether they are actually the best > place to focus development efforts. And the early signs are this is NOT > the most important problem to solve. disabling shadow stack is not simple (removing the elf marking from an exe is often not possible/appropriate, but using an env var that affects an entire process tree has problems too.) but i don't have a solution for this (it is likely a userspace issue). debating the api issues was at least useful for me to understand what can go into the x86 v1 abi and what may go into a v2 abi.
On Mon, Jul 03, 2023 at 07:19:00PM +0100, szabolcs.nagy@arm.com wrote: > The 07/02/2023 18:03, Edgecombe, Rick P wrote: > > 3. Support more options for shadow stack sizing. (I think you are > > referring to this conversation: > > https://lore.kernel.org/lkml/ZAIgrXQ4670gxlE4@arm.com/). I don't see > > why this is needed for the base implementation. If ARM wants to add a > > new rlimit or clone variant, I don't see why x86 can't support it > > later. > i think it can be added later. > but it may be important for deployment on some platforms, since a > libc (or other language runtime) may want to set the shadow stack > size differently than the kernel default, because I agree that this can be deferred to later, so long as we err on the side of large stacks now we can provide an additional limit without confusing things.
* szabolcs: >> alt shadow stack cannot be transparent to existing software anyway, it > > maybe not in glibc, but a libc can internally use alt shadow stack > in sigaltstack instead of exposing a separate sigaltshadowstack api. > (this is what a strict posix conform implementation has to do to > support shadow stacks), leaking shadow stacks is not a correctness > issue unless it prevents the program working (the shadow stack for > the main thread likely wastes more memory than all the alt stack > leaks. if the leaks become dominant in a thread the sigaltstack > libc api can just fail). It should be possible in theory to carve out pages from sigaltstack and push a shadow stack page and a guard page as part of the signal frame. As far as I understand it, the signal frame layout is not ABI, so it's possible to hide arbitrary stuff in it. I'm just saying that it looks possible, not that it's a good idea. Perhaps that's not realistic with 64K pages, though. Thanks, Florian
The 07/03/2023 20:49, Florian Weimer wrote: > * szabolcs: > > >> alt shadow stack cannot be transparent to existing software anyway, it > > > > maybe not in glibc, but a libc can internally use alt shadow stack > > in sigaltstack instead of exposing a separate sigaltshadowstack api. > > (this is what a strict posix conform implementation has to do to > > support shadow stacks), leaking shadow stacks is not a correctness > > issue unless it prevents the program working (the shadow stack for > > the main thread likely wastes more memory than all the alt stack > > leaks. if the leaks become dominant in a thread the sigaltstack > > libc api can just fail). > > It should be possible in theory to carve out pages from sigaltstack and > push a shadow stack page and a guard page as part of the signal frame. > As far as I understand it, the signal frame layout is not ABI, so it's > possible to hide arbitrary stuff in it. I'm just saying that it looks > possible, not that it's a good idea. > > Perhaps that's not realistic with 64K pages, though. interesting idea, but it would not work transparently: the user expects the alt stack memory to be usable as normal memory after longjmping out of a signal handler. this would break code in practice e.g. when a malloced alt stack is passed to free(), the contract there is to not allow changes to the underlying mapping (affects malloc interposition so not possible to paper over inside the libc malloc). so signal entry cannot change the mappings of alt stack. i think kernel internal alt shadow stack allocation works in practice where their lifetime is the same as the thread lifetime. it is sketchy as os interface but doing it in userspace should be fine i think (it's policy what kind of sigaltstack usage is allowed). the kernel is easier in the sense that if there is actual sigreturn then the alt shadow stack can be freed, while libc cannot catch this case (at least not easily). leaked shadow stack can also have security implication but reuse of old alt shadow stack sounds like a minor issue in practice.
On Mon, 2023-07-03 at 19:19 +0100, szabolcs.nagy@arm.com wrote: > Could you spell out what "the issue" is that can be triggered? > > i meant jumping back from the main to the alt stack: > > in main: > setup sig alt stack > setjmp buf1 > raise signal on first return > longjmp buf2 on second return > > in signal handler: > setjmp buf2 > longjmp buf1 on first return > can continue after second return > > in my reading of posix this is valid (and works if signals are masked > such that the alt stack is not clobbered when jumping away from it). > > but cannot work with a single shared shadow stack. Ah, I see. To make this work seamlessly, you would need to have automatic alt shadow stacks, and as we previously discussed this is not possible with the existing sigaltstack API. (Or at least it seemed like a closed discussion to me). If there is a solution, then we are currently missing a detailed proposal. It looks like further down you proposed leaking alt shadow stacks (quoted up here near the related discussion): On Mon, 2023-07-03 at 19:19 +0100, szabolcs.nagy@arm.com wrote: > maybe not in glibc, but a libc can internally use alt shadow stack > in sigaltstack instead of exposing a separate sigaltshadowstack api. > (this is what a strict posix conform implementation has to do to > support shadow stacks), leaking shadow stacks is not a correctness > issue unless it prevents the program working (the shadow stack for > the main thread likely wastes more memory than all the alt stack > leaks. if the leaks become dominant in a thread the sigaltstack > libc api can just fail). It seems like your priority must be to make sure pure C apps don't have to make any changes in order to not crash with shadow stack enabled. And this at the expense of any performance and memory usage. Do you have some formalized priorities or design philosophy you can share? Earlier you suggested glibc should create new interfaces to handle makecontext() (makes sense). Shouldn't the same thing happen here? In which case we are in code-changes territory and we should ask ourselves what apps really need. > > > > we > > > can ignore that corner case and adjust the model so the shared > > > shadow stack works for alt stack, but it likely does not change > > > the > > > jump design: eventually we want alt shadow stack.) > > > > As we discussed previously, alt shadow stack can't work > > transparently > > with existing code due to the sigaltstack API. I wonder if maybe > > you > > are trying to get at something else, and I'm not following. > > i would like a jump design that works with alt shadow stack. A shadow stack switch could happen based on the following scenarios: 1. Alt shadow stack 2. ucontext 3. custom stack switching logic If we leave a token on signal, then 1 and 2 could be guaranteed to have a token *somewhere* above where setjmp() could have been called. The algorithm could be to search from the target SSP up the stack until it finds a token, and then switch to it and INCSSP back to the SSP of the setjmp() point. This is what we are talking about, right? And the two problems are: - Alt shadow stack overflow problem - In the case of (3) there might not be a token Let's ignore these problems for a second - now we have a solution that allows you to longjmp() back from an alt stack or ucontext stack. Or at least it works functionally. But is it going to actually work for people who are using longjmp() for things that are supposed to be fast? Like, is this the tradeoff people want? I see some references to fiber switching implementations using longjmp(). I wonder if the existing INCSSP loops are not going to be ideal for every usage already, and this sounds like going further down that road. For jumping out occasionally in some error case, it seems it would be useful. But I think we are then talking about targeting a subset of people using these stack switching patterns. Looking at the docs Mark linked (thanks!), ARM has generic GCS PUSH and POP shadow stack instructions? Can ARM just push a restore token at setjmp time, like I was trying to figure out earlier with a push token arch_prctl? It would be good to understand how ARM is going to implement this with these differences in what is allowed by the HW. If there are differences in how locked down/functional the hardware implementations are, and if we want to have some unified set of rules for apps, there will need to some give and take. The x86 approach was mostly to not support all behaviors and ask apps to either change or not enable shadow stacks. We don't want one architecture to have to do a bunch of strange things, but we also don't want one to lose some key end user value. I'm thinking that for pure tracing users, glibc might do things a lot differently (use of WRSS to speed things up). So I'm guessing we will end up with at least one more "policy" on the x86 side. I wonder if maybe we should have something like a "max compatibility" policy/mode where arm/x86/riscv could all behave the same from the glibc caller perspective. We could add kernel help to achieve this for any implementation that is more locked down. And maybe that is x86's v2 ABI. I don't know, just sort of thinking out loud at this point. And this sort of gets back to the point I keep making: if we need to decide tradeoffs, it would be great to get some users to start using this and start telling us what they want. Are people caring mostly about security, compatibility or performance? [snip]
On Wed, Jul 05, 2023 at 06:45:38PM +0000, Edgecombe, Rick P wrote: > Looking at the docs Mark linked (thanks!), ARM has generic GCS PUSH and > POP shadow stack instructions? Can ARM just push a restore token at > setjmp time, like I was trying to figure out earlier with a push token > arch_prctl? It would be good to understand how ARM is going to > implement this with these differences in what is allowed by the HW. > If there are differences in how locked down/functional the hardware > implementations are, and if we want to have some unified set of rules > for apps, there will need to some give and take. The x86 approach was > mostly to not support all behaviors and ask apps to either change or > not enable shadow stacks. We don't want one architecture to have to do > a bunch of strange things, but we also don't want one to lose some key > end user value. GCS is all or nothing, either the hardware supports GCS or it doesn't. There are finer grained hypervisor traps (see HFGxTR_EL2 in the system registers) but they aren't intended to be used to disable partial functionality and there's a strong chance we'd just disable the feature in the face of such usage. The kernel does have the option to control which functionality is exposed to userspace, in particular we have separate controls for use of the GCS, the push/pop instructions and the store instructions (similarly to the control x86 has for WRSS). Similarly to the handling of WRSS in your series my patches allow userspace to choose which of these features are enabled.
On Wed, 2023-07-05 at 20:10 +0100, Mark Brown wrote: > On Wed, Jul 05, 2023 at 06:45:38PM +0000, Edgecombe, Rick P wrote: > > > Looking at the docs Mark linked (thanks!), ARM has generic GCS PUSH > > and > > POP shadow stack instructions? Can ARM just push a restore token at > > setjmp time, like I was trying to figure out earlier with a push > > token > > arch_prctl? It would be good to understand how ARM is going to > > implement this with these differences in what is allowed by the HW. > > > If there are differences in how locked down/functional the hardware > > implementations are, and if we want to have some unified set of > > rules > > for apps, there will need to some give and take. The x86 approach > > was > > mostly to not support all behaviors and ask apps to either change > > or > > not enable shadow stacks. We don't want one architecture to have to > > do > > a bunch of strange things, but we also don't want one to lose some > > key > > end user value. > > GCS is all or nothing, either the hardware supports GCS or it > doesn't. > There are finer grained hypervisor traps (see HFGxTR_EL2 in the > system > registers) but they aren't intended to be used to disable partial > functionality and there's a strong chance we'd just disable the > feature > in the face of such usage. The kernel does have the option to > control > which functionality is exposed to userspace, in particular we have > separate controls for use of the GCS, the push/pop instructions and > the > store instructions (similarly to the control x86 has for WRSS). > Similarly to the handling of WRSS in your series my patches allow > userspace to choose which of these features are enabled. Ah, interesting, thanks for the extra info. So which features is glibc planning to use? (probably more of a question for Szabolcs). Are push and pop controllable separately?
On Wed, Jul 05, 2023 at 07:17:25PM +0000, Edgecombe, Rick P wrote: > Ah, interesting, thanks for the extra info. So which features is glibc > planning to use? (probably more of a question for Szabolcs). Are push > and pop controllable separately? Push and pop are one control, you get both or neither. I'll defer to Szabolcs on glibc plans.
The 07/05/2023 18:45, Edgecombe, Rick P wrote: > On Mon, 2023-07-03 at 19:19 +0100, szabolcs.nagy@arm.com wrote: > > Could you spell out what "the issue" is that can be triggered? > > > > i meant jumping back from the main to the alt stack: > > > > in main: > > setup sig alt stack > > setjmp buf1 > > raise signal on first return > > longjmp buf2 on second return > > > > in signal handler: > > setjmp buf2 > > longjmp buf1 on first return > > can continue after second return > > > > in my reading of posix this is valid (and works if signals are masked > > such that the alt stack is not clobbered when jumping away from it). > > > > but cannot work with a single shared shadow stack. > > Ah, I see. To make this work seamlessly, you would need to have > automatic alt shadow stacks, and as we previously discussed this is not > possible with the existing sigaltstack API. (Or at least it seemed like > a closed discussion to me). > > If there is a solution, then we are currently missing a detailed > proposal. It looks like further down you proposed leaking alt shadow > stacks (quoted up here near the related discussion): > > On Mon, 2023-07-03 at 19:19 +0100, szabolcs.nagy@arm.com wrote: > > maybe not in glibc, but a libc can internally use alt shadow stack > > in sigaltstack instead of exposing a separate sigaltshadowstack api. > > (this is what a strict posix conform implementation has to do to > > support shadow stacks), leaking shadow stacks is not a correctness > > issue unless it prevents the program working (the shadow stack for > > the main thread likely wastes more memory than all the alt stack > > leaks. if the leaks become dominant in a thread the sigaltstack > > libc api can just fail). > > It seems like your priority must be to make sure pure C apps don't have > to make any changes in order to not crash with shadow stack enabled. > And this at the expense of any performance and memory usage. Do you > have some formalized priorities or design philosophy you can share? > > Earlier you suggested glibc should create new interfaces to handle > makecontext() (makes sense). Shouldn't the same thing happen here? In > which case we are in code-changes territory and we should ask ourselves > what apps really need. instead of priority, i'd say "posix conform c apps work without change" is a benchmark i use to see if the design is sound. i do not have a particular workload (or distro) in mind, so i have to reason through the cases that make sense and the current linux syscall abi allows, but fail or difficult to support with shadow stacks. one such case is jumping back to an alt stack (i.e. inactive live alt stack): - with shared shadow stack this does not work in many cases. - with alt shadow stack this extends the lifetime beyond the point it become inactive (so it cannot be freed). if there are no inactive live alt stacks then *both* shared and implicit alt shadow stack works. and to me it looked like implicit alt shadow stack is simply better of those two (more alt shadow stack use-cases are supported, shadow stack overflow can be handled. drawback: complications due to the discontinous shadow stack.) on arm64 i personally don't like the idea of "deal with alt shadow stack later" because it likely requires a v2 abi affecting the unwinder and jump implementations. (later extensions are fine if they are bw compat and discoverable) one nasty case is shadow stack overflow handling, but i think i have a solution for that (not the nicest thing: it involves setting the top bit on the last entry on the shadow stack instead of adding a new entry to it. + a new syscall that can switch to this entry. i havent convinced myself about this yet). > > > > > > > we > > > > can ignore that corner case and adjust the model so the shared > > > > shadow stack works for alt stack, but it likely does not change > > > > the > > > > jump design: eventually we want alt shadow stack.) > > > > > > As we discussed previously, alt shadow stack can't work > > > transparently > > > with existing code due to the sigaltstack API. I wonder if maybe > > > you > > > are trying to get at something else, and I'm not following. > > > > i would like a jump design that works with alt shadow stack. > > A shadow stack switch could happen based on the following scenarios: > 1. Alt shadow stack > 2. ucontext > 3. custom stack switching logic > > If we leave a token on signal, then 1 and 2 could be guaranteed to have > a token *somewhere* above where setjmp() could have been called. > > The algorithm could be to search from the target SSP up the stack until > it finds a token, and then switch to it and INCSSP back to the SSP of > the setjmp() point. This is what we are talking about, right? > > And the two problems are: > - Alt shadow stack overflow problem > - In the case of (3) there might not be a token > > Let's ignore these problems for a second - now we have a solution that > allows you to longjmp() back from an alt stack or ucontext stack. Or at > least it works functionally. But is it going to actually work for > people who are using longjmp() for things that are supposed to be fast? slow longjmp is bad. (well longjmp is actually always slow in glibc because it sets the signalmask with a syscall, but there are other jump operations that don't do this and want to be fast so yes we want fast jump to be possible). jumping up the shadow stack is at least linear time in the number of frames jumped over (which already sounds significant slowdown however this is amortized by the fact that the stack frames had to be created at some point and that is actually a lot more expensive because it involves write operations, so a zero cost jump will not do any asymptotic speedup compared to a linear cost jump as far as i can see.). with my proposed solution the jump is still linear. (i know x86 incssp can jump many entries at a time and does not have to actually read and check the entries, but technically it's linear time too: you have to do at least one read per page to have the guardpage protection). this all looks fine to me even for extreme made up workloads. > Like, is this the tradeoff people want? I see some references to fiber > switching implementations using longjmp(). I wonder if the existing > INCSSP loops are not going to be ideal for every usage already, and > this sounds like going further down that road. > > For jumping out occasionally in some error case, it seems it would be > useful. But I think we are then talking about targeting a subset of > people using these stack switching patterns. > i simply don't see any trade-off here (i expect no measurable difference with a scanning vs no scanning approach even in a microbenchmark that does longjmp in a loop independently of the stack switch pattern and even if the non-scanning implementation can do wrss). > Looking at the docs Mark linked (thanks!), ARM has generic GCS PUSH and > POP shadow stack instructions? Can ARM just push a restore token at > setjmp time, like I was trying to figure out earlier with a push token > arch_prctl? It would be good to understand how ARM is going to > implement this with these differences in what is allowed by the HW. > > If there are differences in how locked down/functional the hardware > implementations are, and if we want to have some unified set of rules > for apps, there will need to some give and take. The x86 approach was > mostly to not support all behaviors and ask apps to either change or > not enable shadow stacks. We don't want one architecture to have to do > a bunch of strange things, but we also don't want one to lose some key > end user value. > > I'm thinking that for pure tracing users, glibc might do things a lot > differently (use of WRSS to speed things up). So I'm guessing we will > end up with at least one more "policy" on the x86 side. > > I wonder if maybe we should have something like a "max compatibility" > policy/mode where arm/x86/riscv could all behave the same from the > glibc caller perspective. We could add kernel help to achieve this for > any implementation that is more locked down. And maybe that is x86's v2 > ABI. I don't know, just sort of thinking out loud at this point. And > this sort of gets back to the point I keep making: if we need to decide > tradeoffs, it would be great to get some users to start using this and > start telling us what they want. Are people caring mostly about > security, compatibility or performance? > > [snip] >
The 07/05/2023 20:29, Mark Brown wrote: > On Wed, Jul 05, 2023 at 07:17:25PM +0000, Edgecombe, Rick P wrote: > > > Ah, interesting, thanks for the extra info. So which features is glibc > > planning to use? (probably more of a question for Szabolcs). Are push > > and pop controllable separately? > > Push and pop are one control, you get both or neither. > > I'll defer to Szabolcs on glibc plans. gcspopm is always available (esentially *ssp++, this is used for longjmp). i haven't planned anything yet for other modes (i dont know anything where writable shadow stack is better than just turning the feature off, so i expect we at most have a glibc tunable env var to enable it but it will not affect glibc behaviour otherwise).
On Thu, Jul 06, 2023 at 02:14:40PM +0100, szabolcs.nagy@arm.com wrote: > The 07/05/2023 20:29, Mark Brown wrote: > > Push and pop are one control, you get both or neither. > gcspopm is always available (esentially *ssp++, this is used > for longjmp). Ah, sorry - I misremembered there. You're right, it's only push that we have control over.
On Thu, 2023-07-06 at 15:24 +0100, Mark Brown wrote: > On Thu, Jul 06, 2023 at 02:14:40PM +0100, > szabolcs.nagy@arm.com wrote: > > The 07/05/2023 20:29, Mark Brown wrote: > > > > Push and pop are one control, you get both or neither. > > > gcspopm is always available (esentially *ssp++, this is used > > for longjmp). > > Ah, sorry - I misremembered there. You're right, it's only push that > we > have control over. Ah, ok! So if you are not planning to enable the push mode then the features are pretty well aligned, except: - On x86 it is possible to switch stacks without leaving a token behind. - The GCSPOPM/INCSSP looping may require longer loops on ARM because it only pops one at at time. If you are not going to use GCSPUSHM by default, then I think we *should* be able to have some unified set of rules for developers for glibc behaviors at least.
On Thu, 2023-07-06 at 14:07 +0100, szabolcs.nagy@arm.com wrote: [ snip ] > > instead of priority, i'd say "posix conform c apps work > without change" is a benchmark i use to see if the design > is sound. This involves leaking shadow stacks for sigaltstack and makecontext, though, right? This seems kind of wrong. It might be useful for avoiding crashes at all costs, but probably shouldn't be the long term solution. I thought your API updates were the right direction. But, I'm of course not a glibc developer. HJ and friends would have to agree to all of that. > > i do not have a particular workload (or distro) in mind, so > i have to reason through the cases that make sense and the > current linux syscall abi allows, but fail or difficult to > support with shadow stacks. > > one such case is jumping back to an alt stack (i.e. inactive > live alt stack): > > - with shared shadow stack this does not work in many cases. > > - with alt shadow stack this extends the lifetime beyond the > point it become inactive (so it cannot be freed). > > if there are no inactive live alt stacks then *both* shared > and implicit alt shadow stack works. and to me it looked > like implicit alt shadow stack is simply better of those two > (more alt shadow stack use-cases are supported, shadow stack > overflow can be handled. drawback: complications due to the > discontinous shadow stack.) > > on arm64 i personally don't like the idea of "deal with alt > shadow stack later" because it likely requires a v2 abi > affecting the unwinder and jump implementations. (later > extensions are fine if they are bw compat and discoverable) I think you could do it, if your signal handler can push data on the shadow stack like x86 does. I'd start with a padded shadow stack signal frame though, and not trust userspace to parse it. > > one nasty case is shadow stack overflow handling, but i > think i have a solution for that (not the nicest thing: > it involves setting the top bit on the last entry on the > shadow stack instead of adding a new entry to it. + a new > syscall that can switch to this entry. i havent convinced > myself about this yet). There might be some complicated thing around storing the last shadow stack entry into the shadow stack sigframe and restoring it on sigreturn. Then writing a token from the kernel to where the saved frame was to live there in the meantime. But to me this whole search, restore and INCSSP thing is suspect at this point though. We could also increase compatibility and performance more simply, by adding kernel help, at the expense of security. [ snip ] > slow longjmp is bad. (well longjmp is actually always slow > in glibc because it sets the signalmask with a syscall, but > there are other jump operations that don't do this and want > to be fast so yes we want fast jump to be possible). > > jumping up the shadow stack is at least linear time in the > number of frames jumped over (which already sounds significant > slowdown however this is amortized by the fact that the stack > frames had to be created at some point and that is actually a > lot more expensive because it involves write operations, so a > zero cost jump will not do any asymptotic speedup compared to > a linear cost jump as far as i can see.). > > with my proposed solution the jump is still linear. (i know > x86 incssp can jump many entries at a time and does not have > to actually read and check the entries, but technically it's > linear time too: you have to do at least one read per page to > have the guardpage protection). this all looks fine to me > even for extreme made up workloads. Well I guess we are talking about hypothetical performance. But linear time is still worse than O(1). And I thought longjmp() was supposed to be an O(1) type thing. Separate from all of this...now that all the constraints are clearer, if you have changed your mind on whether this series is ready, could you comment at the top of this thread something to that effect? I'm imagining not many are reading so far down at this point. For my part, I think we should go forward with what we have on the kernel side, unless glibc/gcc developers would like to start by deprecating the existing binaries. I just talked with HJ, and he has not changed his plans around this. If anyone else in that community has (Florian?), please speak up. But otherwise I think it's better to start getting real world feedback and grow based on that.
On Thu, Jul 06, 2023 at 04:59:45PM +0000, Edgecombe, Rick P wrote: > On Thu, 2023-07-06 at 15:24 +0100, Mark Brown wrote: > > szabolcs.nagy@arm.com wrote: > > > The 07/05/2023 20:29, Mark Brown wrote: > > > gcspopm is always available (esentially *ssp++, this is used > > > for longjmp). > > Ah, sorry - I misremembered there. You're right, it's only push that > > we > > have control over. FWIW the confusion there was some of the hypervisor features which do tie some of the push and pop instructions together. > Ah, ok! So if you are not planning to enable the push mode then the > features are pretty well aligned, except: > - On x86 it is possible to switch stacks without leaving a token > behind. > - The GCSPOPM/INCSSP looping may require longer loops on ARM > because it only pops one at at time. > If you are not going to use GCSPUSHM by default, then I think we > *should* be able to have some unified set of rules for developers for > glibc behaviors at least. Yes, the only case where I am aware of conciously diverging in any substantial way is that we do not free the GCS when GCS is disabled by userspace, we just disable the updates and checks, and reenabling after disabling is not supported. We have demand for disabling at runtime so we want to keep the stack around for things like a running unwinder but we don't see a practical use for reenabling so didn't worry about figuring out what would make sense for userspace. glibc isn't going to be using that though.
The 07/06/2023 18:25, Edgecombe, Rick P wrote: > On Thu, 2023-07-06 at 14:07 +0100, szabolcs.nagy@arm.com wrote: > > [ snip ] > > > > instead of priority, i'd say "posix conform c apps work > > without change" is a benchmark i use to see if the design > > is sound. > > This involves leaking shadow stacks for sigaltstack and makecontext, > though, right? This seems kind of wrong. It might be useful for > avoiding crashes at all costs, but probably shouldn't be the long term > solution. I thought your API updates were the right direction. new apis are not enough. existing apis either must do something reasonable or shstk must be disabled when that api is used (sigaltstack, makecontext). the disable does not really work as the apis are widely used and there is no 'disable shstk locally' it is viral when a widely used dependency is affected. so apis must do something reasonable. however there will be remaining issues and that will need new apis which can take a long time to transition to. > > one nasty case is shadow stack overflow handling, but i > > think i have a solution for that (not the nicest thing: > > it involves setting the top bit on the last entry on the > > shadow stack instead of adding a new entry to it. + a new > > syscall that can switch to this entry. i havent convinced > > myself about this yet). > > There might be some complicated thing around storing the last shadow > stack entry into the shadow stack sigframe and restoring it on > sigreturn. Then writing a token from the kernel to where the saved > frame was to live there in the meantime. this only works if you jump from the alt stack to the overflowed stack, not if you jump somewhere else in between. this is a would be nice to solve but not the most important case. > > But to me this whole search, restore and INCSSP thing is suspect at > this point though. We could also increase compatibility and performance > more simply, by adding kernel help, at the expense of security. what is the kernel help? (and security trade-off) and why is scan+restore+incssp suspect? the reasoning i've seen are - some ppl might not add restore token: sounds fine, it's already ub to jump to such stack either way. - it is slow: please give an example where there is slowdown. (the slowdown has to be larger than the call/ret overhead) - jump to overflowed shadow stack: i'm fairly sure this can be done (but indeed complicated), and if that's not acceptable then not supporting this case is better than not supporting reliable crash handling (alt stack handler can overflow the shadow stack and shadow stack overflow cannot be handled). > > slow longjmp is bad. (well longjmp is actually always slow > > in glibc because it sets the signalmask with a syscall, but > > there are other jump operations that don't do this and want > > to be fast so yes we want fast jump to be possible). > > > > jumping up the shadow stack is at least linear time in the > > number of frames jumped over (which already sounds significant > > slowdown however this is amortized by the fact that the stack > > frames had to be created at some point and that is actually a > > lot more expensive because it involves write operations, so a > > zero cost jump will not do any asymptotic speedup compared to > > a linear cost jump as far as i can see.). > > > > with my proposed solution the jump is still linear. (i know > > x86 incssp can jump many entries at a time and does not have > > to actually read and check the entries, but technically it's > > linear time too: you have to do at least one read per page to > > have the guardpage protection). this all looks fine to me > > even for extreme made up workloads. > > Well I guess we are talking about hypothetical performance. But linear > time is still worse than O(1). And I thought longjmp() was supposed to > be an O(1) type thing. longjmp is not O(1) with your proposed abi. and i don't think linear time is worse than O(1) in this case. > Separate from all of this...now that all the constraints are clearer, > if you have changed your mind on whether this series is ready, could > you comment at the top of this thread something to that effect? I'm > imagining not many are reading so far down at this point. > > For my part, I think we should go forward with what we have on the > kernel side, unless glibc/gcc developers would like to start by > deprecating the existing binaries. I just talked with HJ, and he has > not changed his plans around this. If anyone else in that community has > (Florian?), please speak up. But otherwise I think it's better to start > getting real world feedback and grow based on that. > the x86 v1 abi tries to be compatible with existing unwinders. (are there other binaries that constrains v1? portable code should be fine as they rely on libc which we can still change) i will have to discuss the arm plan with the arm kernel devs. the ugly bit i want to avoid on arm is to have to reimplement unwind and jump ops to make alt shadow stack work in a v2 abi. i think the worse bit of the x86 v1 abi is that crash handlers don't work reliably (e.g. a crash on a tiny makecontext stack with the usual sigaltstack crash handler can unrecoverably fail during crash handling). i guess this can be somewhat mitigated by both linux and libc adding an extra page to the shadow stack size to guarantee that alt stack handlers with certain depth always work.
On Fri, 2023-07-07 at 16:25 +0100, szabolcs.nagy@arm.com wrote: > > Separate from all of this...now that all the constraints are > > clearer, > > if you have changed your mind on whether this series is ready, > > could > > you comment at the top of this thread something to that effect? I'm > > imagining not many are reading so far down at this point. > > > > For my part, I think we should go forward with what we have on the > > kernel side, unless glibc/gcc developers would like to start by > > deprecating the existing binaries. I just talked with HJ, and he > > has > > not changed his plans around this. If anyone else in that community > > has > > (Florian?), please speak up. But otherwise I think it's better to > > start > > getting real world feedback and grow based on that. > > > > the x86 v1 abi tries to be compatible with existing unwinders. > (are there other binaries that constrains v1? portable code > should be fine as they rely on libc which we can still change) > > i will have to discuss the arm plan with the arm kernel devs. > the ugly bit i want to avoid on arm is to have to reimplement > unwind and jump ops to make alt shadow stack work in a v2 abi. > > i think the worse bit of the x86 v1 abi is that crash handlers > don't work reliably (e.g. a crash on a tiny makecontext stack > with the usual sigaltstack crash handler can unrecoverably fail > during crash handling). i guess this can be somewhat mitigated > by both linux and libc adding an extra page to the shadow stack > size to guarantee that alt stack handlers with certain depth > always work. Some mails back, I listed the three things you might be asking for from the kernel side and pointedly asked you to clarify. The only one you still were wishing for up front was "Leave a token on switching to an alt shadow stack." But how you want to use this involves a lot of details for how glibc will work (automatic shadow stack for sigaltstack, scan-restore-incssp, etc). I think you first need to get the story straight with other libc developers, otherwise this is just brainstorming. I'm not a glibc contributor, so winning me over is only half the battle. Only after that is settled do we get to the problem of the old libgcc unwinders, and how it is a challenge to even add alt shadow stack given glibc's plans and the existing binaries. Once that is solved we are at the overflow problem, and the current state of thinking on that is "i'm fairly sure this can be done (but indeed complicated)". So I think we are still missing any actionable requests that should hold this up. Is this a reasonable summary?
The 07/07/2023 17:37, Edgecombe, Rick P wrote: > On Fri, 2023-07-07 at 16:25 +0100, szabolcs.nagy@arm.com wrote: > > > Separate from all of this...now that all the constraints are > > > clearer, > > > if you have changed your mind on whether this series is ready, > > > could > > > you comment at the top of this thread something to that effect? I'm > > > imagining not many are reading so far down at this point. > > > > > > For my part, I think we should go forward with what we have on the > > > kernel side, unless glibc/gcc developers would like to start by > > > deprecating the existing binaries. I just talked with HJ, and he > > > has > > > not changed his plans around this. If anyone else in that community > > > has > > > (Florian?), please speak up. But otherwise I think it's better to > > > start > > > getting real world feedback and grow based on that. > > > > > > > the x86 v1 abi tries to be compatible with existing unwinders. > > (are there other binaries that constrains v1? portable code > > should be fine as they rely on libc which we can still change) > > > > i will have to discuss the arm plan with the arm kernel devs. > > the ugly bit i want to avoid on arm is to have to reimplement > > unwind and jump ops to make alt shadow stack work in a v2 abi. > > > > i think the worse bit of the x86 v1 abi is that crash handlers > > don't work reliably (e.g. a crash on a tiny makecontext stack > > with the usual sigaltstack crash handler can unrecoverably fail > > during crash handling). i guess this can be somewhat mitigated > > by both linux and libc adding an extra page to the shadow stack > > size to guarantee that alt stack handlers with certain depth > > always work. > > Some mails back, I listed the three things you might be asking for from > the kernel side and pointedly asked you to clarify. The only one you > still were wishing for up front was "Leave a token on switching to an > alt shadow stack." > > But how you want to use this involves a lot of details for how glibc > will work (automatic shadow stack for sigaltstack, scan-restore-incssp, > etc). I think you first need to get the story straight with other libc > developers, otherwise this is just brainstorming. I'm not a glibc > contributor, so winning me over is only half the battle. > > Only after that is settled do we get to the problem of the old libgcc > unwinders, and how it is a challenge to even add alt shadow stack given > glibc's plans and the existing binaries. > > Once that is solved we are at the overflow problem, and the current > state of thinking on that is "i'm fairly sure this can be done (but > indeed complicated)". > > So I think we are still missing any actionable requests that should > hold this up. > > Is this a reasonable summary? not entirely. the high level requirement is a design that a) does not break many existing sigaltstack uses, b) allows implementing jump and unwind that support the relevant use-cases around signals and stack switches with minimal userspace changes. where (b) has nothing to add to v1 abi: existing unwind binaries mean this needs a v2 abi. (the point of discussing v2 ahead of time is to understand the cost of v2 and the divergence wrt targets without abi compat issue.) for (a) my actionable suggestion was to account altstack when sizing shadow stacks. to document an altstack call depth limit on the libc level (e.g. fixed 100 is fine) we need guarantees from the kernel. (consider recursive calls overflowing the stack with altstack crash handler: for this to be reliable shadow stack size > stack size is needed. but the diff can be tiny e.g. 1 page is enough.) your previous 3 actionable item list was 1. add token when handling signals on altstack. this falls under (b). your summary is correct that this requires sorting out many fiddly details. 2. top of stack token. this can work differently across targets so i have nothing against the x86 v1 abi, but on arm64 we plan to have this. 3. more shadow stack sizing policies. this can be done in the future except the default policy should be fixed for (a) and a smaller size introduces the overflow issue which may require v2. in short the only important change for v1 is shstk sizing.
On Mon, 2023-07-10 at 17:54 +0100, szabolcs.nagy@arm.com wrote: > > Some mails back, I listed the three things you might be asking for > > from > > the kernel side and pointedly asked you to clarify. The only one > > you > > still were wishing for up front was "Leave a token on switching to > > an > > alt shadow stack." > > > > But how you want to use this involves a lot of details for how > > glibc > > will work (automatic shadow stack for sigaltstack, scan-restore- > > incssp, > > etc). I think you first need to get the story straight with other > > libc > > developers, otherwise this is just brainstorming. I'm not a glibc > > contributor, so winning me over is only half the battle. > > > > Only after that is settled do we get to the problem of the old > > libgcc > > unwinders, and how it is a challenge to even add alt shadow stack > > given > > glibc's plans and the existing binaries. > > > > Once that is solved we are at the overflow problem, and the current > > state of thinking on that is "i'm fairly sure this can be done (but > > indeed complicated)". > > > > So I think we are still missing any actionable requests that should > > hold this up. > > > > Is this a reasonable summary? > > not entirely. > > the high level requirement is a design that > > a) does not break many existing sigaltstack uses, > > b) allows implementing jump and unwind that support the > relevant use-cases around signals and stack switches > with minimal userspace changes. Please open a discussion with the other glibc developers that have been involved with shadow stack regarding this subject(b). Please include me (and probably AndyL would be interested?). I think we've talked it through as much as you and I can at this point. Let's at least start a new more focused thread on the "unwind across stacks" problem. And also get some consensus on the wisdom of the related suggestion to leak shadow stacks in order to transparently support existing posix APIs. > > where (b) has nothing to add to v1 abi: existing unwind > binaries mean this needs a v2 abi. (the point of discussing > v2 ahead of time is to understand the cost of v2 and the > divergence wrt targets without abi compat issue.) > > for (a) my actionable suggestion was to account altstack > when sizing shadow stacks. to document an altstack call > depth limit on the libc level (e.g. fixed 100 is fine) we > need guarantees from the kernel. (consider recursive calls > overflowing the stack with altstack crash handler: for this > to be reliable shadow stack size > stack size is needed. > but the diff can be tiny e.g. 1 page is enough.) > > your previous 3 actionable item list was > > 1. add token when handling signals on altstack. > > this falls under (b). your summary is correct that this > requires sorting out many fiddly details. > > 2. top of stack token. > > this can work differently across targets so i have nothing > against the x86 v1 abi, but on arm64 we plan to have this. > > 3. more shadow stack sizing policies. > > this can be done in the future except the default policy > should be fixed for (a) and a smaller size introduces the > overflow issue which may require v2. > > in short the only important change for v1 is shstk sizing. I tried searching through this long thread and AFAICT this is a new idea. Sorry if I missed something, but your previous answer on this(3) seemed concerned with the opposite problem (oversized shadow stacks). Quoted from a past mail: On Mon, 2023-07-03 at 19:19 +0100, szabolcs.nagy@arm.com wrote: > i think it can be added later. > > but it may be important for deployment on some platforms, since a > libc (or other language runtime) may want to set the shadow stack > size differently than the kernel default, because > > - languages allocating large arrays on the stack > (too big shadow stack can cause OOM with overcommit off and > rlimits can be hit like RLIMIT_DATA, RLIMIT_AS because of it) > > - tiny thread stack but big sigaltstack (musl libc, go). So you can probably see how I got the impression that 3 was closed. But anyways, ok, so if we add a page to every thread allocated shadow stack, then you can guarantee that an alt stack can have some room to handle at least a single alt stack signal, even in the case of exhausting the entire stack by recursively making calls and pushing nothing else to the stack. SS_AUTODISARM remains a bit muddy. Also glibc would have to size ucontext shadow stacks with an additional page as well. I think it would be good to get some other signs of interest on this tweak due to the requirements for glibc to participate on the scheme. Can you gather that quickly, so we can get this all prepped again? To me (unless I'm missing something), it seems like complicating the equation for probably no real world benefit due to the low chances of exhausting a shadow stack. But if there is consensus on the glibc side, then I'm happy to make the change to finally settle this discussion.
The 07/10/2023 22:56, Edgecombe, Rick P wrote: > On Mon, 2023-07-10 at 17:54 +0100, szabolcs.nagy@arm.com wrote: > > in short the only important change for v1 is shstk sizing. > > I tried searching through this long thread and AFAICT this is a new > idea. Sorry if I missed something, but your previous answer on this(3) > seemed concerned with the opposite problem (oversized shadow stacks). > > Quoted from a past mail: > On Mon, 2023-07-03 at 19:19 +0100, szabolcs.nagy@arm.com wrote: ... > > - tiny thread stack but big sigaltstack (musl libc, go). and 4 months earlier: > Date: Fri, 3 Mar 2023 16:30:37 +0000 > Subject: Re: [PATCH v7 01/41] Documentation/x86: Add CET shadow stack description > > Looking at this again, I'm not sure why a new rlimit is needed. It > > seems many of those points were just formulations of that the clone3 > > stack size was not used, but it actually is and just not documented. If > > you disagree perhaps you could elaborate on what the requirements are > > and we can see if it seems tricky to do in a follow up. > > - tiny thread stack and deep signal stack. > (note that this does not really work with glibc because it has > implementation internal signals that don't run on alt stack, > cannot be masked and don't fit on a tiny thread stack, but > with other runtimes this can be a valid use-case, e.g. musl > allows tiny thread stacks, < pagesize.) ... that analysis was wrong: it considered handling any signals at all, but if the stack overflows due to a recursive call with empty stack frames then the stack gets used up at the same rate as the shadow stack and since the signal handler on the alt stack uses the same shadow stack that can easily overflow too independently of the stack size. this is a big deal as it affects what operations the libc can support and handling stack overflow is a common requirement. i originally argued for a fix using separate alt shadow stacks, but since it became clear that does not work for the x86 v1 abi i am recommending the size increase. with shadow stack size = stack size + 1 page, libc can document a depth limit for the alt stack that works. (longjumping back to the alt stack is still broken though, but that's rarely a requirement) > Also glibc would have to size ucontext shadow stacks with an additional yes, assuming glibc wants to support sigaltstack. > page as well. I think it would be good to get some other signs of > interest on this tweak due to the requirements for glibc to participate > on the scheme. Can you gather that quickly, so we can get this all > prepped again? i can cc libc-alpha. the decision is for x86 shadow stack linux abi to use shadow stack size = stack size or shadow stack size = stack size + 1 page as default policy when alt stack signals use the same shadow stack, not a separate one. note: smallest stack frame size is 8bytes, same as the shadow stack entry. on a target where smallest frame size is 2x shadow stack entry size, the formula would use (stack size / 2). note: there is no api to change the policy from userspace at this point.
The 07/11/2023 09:08, szabolcs.nagy--- via Libc-alpha wrote: > the decision is for x86 shadow stack linux abi to use > > shadow stack size = stack size > > or > > shadow stack size = stack size + 1 page > > as default policy when alt stack signals use the same > shadow stack, not a separate one. > > note: smallest stack frame size is 8bytes, same as the > shadow stack entry. on a target where smallest frame > size is 2x shadow stack entry size, the formula would > use (stack size / 2). i convinced myself that shadow stack size = stack size works: libc can reserve N bytes on the initial stack frame so when the stack overflows there will be at least N bytes on the shadow stack usable for signal handling. this is only bad for tiny user allocated stacks where libc should not consume too much stack space. but e.g. glibc already uses >128 bytes on the initial stack frame for its cancellation jumpbuf so 16 deep signal call stack is already guaranteed to work. the glibc makecontext code has to be adjusted, but that's a libc side discussion. the shadow stack of the main stack can still overflow, but that requires increasing RLIMIT_STACK at runtime which is not very common.
The 06/12/2023 17:10, Rick Edgecombe wrote: > Introduce a new document on Control-flow Enforcement Technology (CET). > > Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com> > Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> > Reviewed-by: Kees Cook <keescook@chromium.org> > Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> > Tested-by: Pengfei Xu <pengfei.xu@intel.com> > Tested-by: John Allen <john.allen@amd.com> > Tested-by: Kees Cook <keescook@chromium.org> i don't have more comments on this. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
diff --git a/Documentation/arch/x86/index.rst b/Documentation/arch/x86/index.rst index c73d133fd37c..8ac64d7de4dc 100644 --- a/Documentation/arch/x86/index.rst +++ b/Documentation/arch/x86/index.rst @@ -22,6 +22,7 @@ x86-specific Documentation mtrr pat intel-hfi + shstk iommu intel_txt amd-memory-encryption diff --git a/Documentation/arch/x86/shstk.rst b/Documentation/arch/x86/shstk.rst new file mode 100644 index 000000000000..f09afa504ec0 --- /dev/null +++ b/Documentation/arch/x86/shstk.rst @@ -0,0 +1,169 @@ +.. SPDX-License-Identifier: GPL-2.0 + +====================================================== +Control-flow Enforcement Technology (CET) Shadow Stack +====================================================== + +CET Background +============== + +Control-flow Enforcement Technology (CET) covers several related x86 processor +features that provide protection against control flow hijacking attacks. CET +can protect both applications and the kernel. + +CET introduces shadow stack and indirect branch tracking (IBT). A shadow stack +is a secondary stack allocated from memory which cannot be directly modified by +applications. When executing a CALL instruction, the processor pushes the +return address to both the normal stack and the shadow stack. Upon +function return, the processor pops the shadow stack copy and compares it +to the normal stack copy. If the two differ, the processor raises a +control-protection fault. IBT verifies indirect CALL/JMP targets are intended +as marked by the compiler with 'ENDBR' opcodes. Not all CPU's have both Shadow +Stack and Indirect Branch Tracking. Today in the 64-bit kernel, only userspace +shadow stack and kernel IBT are supported. + +Requirements to use Shadow Stack +================================ + +To use userspace shadow stack you need HW that supports it, a kernel +configured with it and userspace libraries compiled with it. + +The kernel Kconfig option is X86_USER_SHADOW_STACK. When compiled in, shadow +stacks can be disabled at runtime with the kernel parameter: nousershstk. + +To build a user shadow stack enabled kernel, Binutils v2.29 or LLVM v6 or later +are required. + +At run time, /proc/cpuinfo shows CET features if the processor supports +CET. "user_shstk" means that userspace shadow stack is supported on the current +kernel and HW. + +Application Enabling +==================== + +An application's CET capability is marked in its ELF note and can be verified +from readelf/llvm-readelf output:: + + readelf -n <application> | grep -a SHSTK + properties: x86 feature: SHSTK + +The kernel does not process these applications markers directly. Applications +or loaders must enable CET features using the interface described in section 4. +Typically this would be done in dynamic loader or static runtime objects, as is +the case in GLIBC. + +Enabling arch_prctl()'s +======================= + +Elf features should be enabled by the loader using the below arch_prctl's. They +are only supported in 64 bit user applications. These operate on the features +on a per-thread basis. The enablement status is inherited on clone, so if the +feature is enabled on the first thread, it will propagate to all the thread's +in an app. + +arch_prctl(ARCH_SHSTK_ENABLE, unsigned long feature) + Enable a single feature specified in 'feature'. Can only operate on + one feature at a time. + +arch_prctl(ARCH_SHSTK_DISABLE, unsigned long feature) + Disable a single feature specified in 'feature'. Can only operate on + one feature at a time. + +arch_prctl(ARCH_SHSTK_LOCK, unsigned long features) + Lock in features at their current enabled or disabled status. 'features' + is a mask of all features to lock. All bits set are processed, unset bits + are ignored. The mask is ORed with the existing value. So any feature bits + set here cannot be enabled or disabled afterwards. + +The return values are as follows. On success, return 0. On error, errno can +be:: + + -EPERM if any of the passed feature are locked. + -ENOTSUPP if the feature is not supported by the hardware or + kernel. + -EINVAL arguments (non existing feature, etc) + +The feature's bits supported are:: + + ARCH_SHSTK_SHSTK - Shadow stack + ARCH_SHSTK_WRSS - WRSS + +Currently shadow stack and WRSS are supported via this interface. WRSS +can only be enabled with shadow stack, and is automatically disabled +if shadow stack is disabled. + +Proc Status +=========== +To check if an application is actually running with shadow stack, the +user can read the /proc/$PID/status. It will report "wrss" or "shstk" +depending on what is enabled. The lines look like this:: + + x86_Thread_features: shstk wrss + x86_Thread_features_locked: shstk wrss + +Implementation of the Shadow Stack +================================== + +Shadow Stack Size +----------------- + +A task's shadow stack is allocated from memory to a fixed size of +MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to +the maximum size of the normal stack, but capped to 4 GB. In the case +of the clone3 syscall, there is a stack size passed in and shadow stack +uses this instead of the rlimit. + +Signal +------ + +The main program and its signal handlers use the same shadow stack. Because +the shadow stack stores only return addresses, a large shadow stack covers +the condition that both the program stack and the signal alternate stack run +out. + +When a signal happens, the old pre-signal state is pushed on the stack. When +shadow stack is enabled, the shadow stack specific state is pushed onto the +shadow stack. Today this is only the old SSP (shadow stack pointer), pushed +in a special format with bit 63 set. On sigreturn this old SSP token is +verified and restored by the kernel. The kernel will also push the normal +restorer address to the shadow stack to help userspace avoid a shadow stack +violation on the sigreturn path that goes through the restorer. + +So the shadow stack signal frame format is as follows:: + + |1...old SSP| - Pointer to old pre-signal ssp in sigframe token format + (bit 63 set to 1) + | ...| - Other state may be added in the future + + +32 bit ABI signals are not supported in shadow stack processes. Linux prevents +32 bit execution while shadow stack is enabled by the allocating shadow stacks +outside of the 32 bit address space. When execution enters 32 bit mode, either +via far call or returning to userspace, a #GP is generated by the hardware +which, will be delivered to the process as a segfault. When transitioning to +userspace the register's state will be as if the userspace ip being returned to +caused the segfault. + +Fork +---- + +The shadow stack's vma has VM_SHADOW_STACK flag set; its PTEs are required +to be read-only and dirty. When a shadow stack PTE is not RO and dirty, a +shadow access triggers a page fault with the shadow stack access bit set +in the page fault error code. + +When a task forks a child, its shadow stack PTEs are copied and both the +parent's and the child's shadow stack PTEs are cleared of the dirty bit. +Upon the next shadow stack access, the resulting shadow stack page fault +is handled by page copy/re-use. + +When a pthread child is created, the kernel allocates a new shadow stack +for the new thread. New shadow stack creation behaves like mmap() with respect +to ASLR behavior. Similarly, on thread exit the thread's shadow stack is +disabled. + +Exec +---- + +On exec, shadow stack features are disabled by the kernel. At which point, +userspace can choose to re-enable, or lock them.