diff mbox series

[v4,1/4] x86/ibt: factor out cfi and fineibt offset

Message ID 20250303132837.498938-2-dongml2@chinatelecom.cn (mailing list archive)
State New
Headers show
Series per-function storage support | expand

Commit Message

Menglong Dong March 3, 2025, 1:28 p.m. UTC
For now, the layout of cfi and fineibt is hard coded, and the padding is
fixed on 16 bytes.

Factor out FINEIBT_INSN_OFFSET and CFI_INSN_OFFSET. CFI_INSN_OFFSET is
the offset of cfi, which is the same as FUNCTION_ALIGNMENT when
CALL_PADDING is enabled. And FINEIBT_INSN_OFFSET is the offset where we
put the fineibt preamble on, which is 16 for now.

When the FUNCTION_ALIGNMENT is bigger than 16, we place the fineibt
preamble on the last 16 bytes of the padding for better performance, which
means the fineibt preamble don't use the space that cfi uses.

The FINEIBT_INSN_OFFSET is not used in fineibt_caller_start and
fineibt_paranoid_start, as it is always "0x10". Note that we need to
update the offset in fineibt_caller_start and fineibt_paranoid_start if
FINEIBT_INSN_OFFSET changes.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
v4:
- rebase to the newest tip/x86/core, the fineibt has some updating
---
 arch/x86/include/asm/cfi.h    | 13 +++++++++----
 arch/x86/kernel/alternative.c | 18 +++++++++++-------
 arch/x86/net/bpf_jit_comp.c   | 22 +++++++++++-----------
 3 files changed, 31 insertions(+), 22 deletions(-)

Comments

Peter Zijlstra March 3, 2025, 4:54 p.m. UTC | #1
On Mon, Mar 03, 2025 at 09:28:34PM +0800, Menglong Dong wrote:
> For now, the layout of cfi and fineibt is hard coded, and the padding is
> fixed on 16 bytes.
> 
> Factor out FINEIBT_INSN_OFFSET and CFI_INSN_OFFSET. CFI_INSN_OFFSET is
> the offset of cfi, which is the same as FUNCTION_ALIGNMENT when
> CALL_PADDING is enabled. And FINEIBT_INSN_OFFSET is the offset where we
> put the fineibt preamble on, which is 16 for now.
> 
> When the FUNCTION_ALIGNMENT is bigger than 16, we place the fineibt
> preamble on the last 16 bytes of the padding for better performance, which
> means the fineibt preamble don't use the space that cfi uses.
> 
> The FINEIBT_INSN_OFFSET is not used in fineibt_caller_start and
> fineibt_paranoid_start, as it is always "0x10". Note that we need to
> update the offset in fineibt_caller_start and fineibt_paranoid_start if
> FINEIBT_INSN_OFFSET changes.
> 
> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>

I'm confused as to what exactly you mean.

Preamble will have __cfi symbol and some number of NOPs right before
actual symbol like:

__cfi_foo:
  mov $0x12345678, %reg
  nop
  nop
  nop
  ...
foo:

FineIBT must be at foo-16, has nothing to do with performance. This 16
can also be spelled: fineibt_preamble_size.

The total size of the preamble is FUNCTION_PADDING_BYTES + CFI_CLANG*5.

If you increase FUNCTION_PADDING_BYTES by another 5, which is what you
want I think, then we'll have total preamble of 21 bytes; 5 bytes kCFI,
16 bytes nop.

Then kCFI expects hash to be at -20, while FineIBT must be at -16.

This then means there is no unambiguous hole for you to stick your
meta-data thing (whatever that is).

There are two options: make meta data location depend on cfi_mode, or
have __apply_fineibt() rewrite kCFI to also be at -16, so that you can
have -21 for your 5 bytes.

I think I prefer latter.

In any case, I don't think we need *_INSN_OFFSET. At most we need
PREAMBLE_SIZE.

Hmm?
Menglong Dong March 4, 2025, 1:10 a.m. UTC | #2
On Tue, Mar 4, 2025 at 12:55 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Mar 03, 2025 at 09:28:34PM +0800, Menglong Dong wrote:
> > For now, the layout of cfi and fineibt is hard coded, and the padding is
> > fixed on 16 bytes.
> >
> > Factor out FINEIBT_INSN_OFFSET and CFI_INSN_OFFSET. CFI_INSN_OFFSET is
> > the offset of cfi, which is the same as FUNCTION_ALIGNMENT when
> > CALL_PADDING is enabled. And FINEIBT_INSN_OFFSET is the offset where we
> > put the fineibt preamble on, which is 16 for now.
> >
> > When the FUNCTION_ALIGNMENT is bigger than 16, we place the fineibt
> > preamble on the last 16 bytes of the padding for better performance, which
> > means the fineibt preamble don't use the space that cfi uses.
> >
> > The FINEIBT_INSN_OFFSET is not used in fineibt_caller_start and
> > fineibt_paranoid_start, as it is always "0x10". Note that we need to
> > update the offset in fineibt_caller_start and fineibt_paranoid_start if
> > FINEIBT_INSN_OFFSET changes.
> >
> > Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
>
> I'm confused as to what exactly you mean.
>
> Preamble will have __cfi symbol and some number of NOPs right before
> actual symbol like:
>
> __cfi_foo:
>   mov $0x12345678, %reg
>   nop
>   nop
>   nop
>   ...
> foo:
>
> FineIBT must be at foo-16, has nothing to do with performance. This 16
> can also be spelled: fineibt_preamble_size.
>
> The total size of the preamble is FUNCTION_PADDING_BYTES + CFI_CLANG*5.
>
> If you increase FUNCTION_PADDING_BYTES by another 5, which is what you
> want I think, then we'll have total preamble of 21 bytes; 5 bytes kCFI,
> 16 bytes nop.

Hello, sorry that I forgot to add something to the changelog. In fact,
I don't add extra 5-bytes anymore, which you can see in the 3rd patch.

The thing is that we can't add extra 5-bytes if CFI is enabled. Without
CFI, we can make the padding space any value, such as 5-bytes, and
the layout will be like this:

__align:
  nop
  nop
  nop
  nop
  nop
foo: -- __align +5

However, the CFI will always make the cfi insn 16-bytes aligned. When
we set the FUNCTION_PADDING_BYTES to (11 + 5), the layout will be
like this:

__cfi_foo:
  nop (11)
  mov $0x12345678, %reg
  nop (16)
foo:

and the padding space is 32-bytes actually. So, we can just select
FUNCTION_ALIGNMENT_32B instead, which makes the padding
space 32-bytes too, and have the following layout:

__cfi_foo:
  mov $0x12345678, %reg
  nop (27)
foo:

And the layout will be like this if fineibt and function metadata is both
used:

__cfi_foo:
        mov     --      5       -- cfi, not used anymore if fineibt is used
        nop
        nop
        nop
        mov     --      5       -- function metadata
        nop
        nop
        nop
        fineibt --      16      -- fineibt
foo:
        nopw    --      4
        ......

The things that I make in this commit is to make sure that
the code in arch/x86/kernel/alternative.c can find the location
of cfi hash and fineibt depends on the FUNCTION_ALIGNMENT.
the offset of cfi and fineibt is different now, so we need to do
some adjustment here.

In the beginning, I thought to make the layout like this to ensure
that the offset of cfi and fineibt the same:

__cfi_foo:
        fineibt  --   16  --  fineibt
        mov    --    5    -- function metadata
        nop(11)
foo:
        nopw    --      4
        ......

The adjustment will be easier in this mode. However, it may have
impact on the performance. That way I say it doesn't impact the
performance in this commit.

Sorry that I didn't describe it clearly in the commit log, and I'll
add the things above to the commit log too in the next version.

Thanks!
Menglong Dong

>
> Then kCFI expects hash to be at -20, while FineIBT must be at -16.
>
> This then means there is no unambiguous hole for you to stick your
> meta-data thing (whatever that is).
>
> There are two options: make meta data location depend on cfi_mode, or
> have __apply_fineibt() rewrite kCFI to also be at -16, so that you can
> have -21 for your 5 bytes.
>
> I think I prefer latter.
>
> In any case, I don't think we need *_INSN_OFFSET. At most we need
> PREAMBLE_SIZE.
>
> Hmm?
Peter Zijlstra March 4, 2025, 5:38 a.m. UTC | #3
On Tue, Mar 04, 2025 at 09:10:12AM +0800, Menglong Dong wrote:
> Hello, sorry that I forgot to add something to the changelog. In fact,
> I don't add extra 5-bytes anymore, which you can see in the 3rd patch.
> 
> The thing is that we can't add extra 5-bytes if CFI is enabled. Without
> CFI, we can make the padding space any value, such as 5-bytes, and
> the layout will be like this:
> 
> __align:
>   nop
>   nop
>   nop
>   nop
>   nop
> foo: -- __align +5
> 
> However, the CFI will always make the cfi insn 16-bytes aligned. When
> we set the FUNCTION_PADDING_BYTES to (11 + 5), the layout will be
> like this:
> 
> __cfi_foo:
>   nop (11)
>   mov $0x12345678, %reg
>   nop (16)
> foo:
> 
> and the padding space is 32-bytes actually. So, we can just select
> FUNCTION_ALIGNMENT_32B instead, which makes the padding
> space 32-bytes too, and have the following layout:
> 
> __cfi_foo:
>   mov $0x12345678, %reg
>   nop (27)
> foo:

*blink*, wtf is clang smoking.

I mean, you're right, this is what it is doing, but that is somewhat
unexpected. Let me go look at clang source, this is insane.
Peter Zijlstra March 4, 2025, 6:16 a.m. UTC | #4
On Tue, Mar 04, 2025 at 06:38:53AM +0100, Peter Zijlstra wrote:
> On Tue, Mar 04, 2025 at 09:10:12AM +0800, Menglong Dong wrote:
> > Hello, sorry that I forgot to add something to the changelog. In fact,
> > I don't add extra 5-bytes anymore, which you can see in the 3rd patch.
> > 
> > The thing is that we can't add extra 5-bytes if CFI is enabled. Without
> > CFI, we can make the padding space any value, such as 5-bytes, and
> > the layout will be like this:
> > 
> > __align:
> >   nop
> >   nop
> >   nop
> >   nop
> >   nop
> > foo: -- __align +5
> > 
> > However, the CFI will always make the cfi insn 16-bytes aligned. When
> > we set the FUNCTION_PADDING_BYTES to (11 + 5), the layout will be
> > like this:
> > 
> > __cfi_foo:
> >   nop (11)
> >   mov $0x12345678, %reg
> >   nop (16)
> > foo:
> > 
> > and the padding space is 32-bytes actually. So, we can just select
> > FUNCTION_ALIGNMENT_32B instead, which makes the padding
> > space 32-bytes too, and have the following layout:
> > 
> > __cfi_foo:
> >   mov $0x12345678, %reg
> >   nop (27)
> > foo:
> 
> *blink*, wtf is clang smoking.
> 
> I mean, you're right, this is what it is doing, but that is somewhat
> unexpected. Let me go look at clang source, this is insane.

Bah, this is because assemblers are stupid :/

There is no way to tell them to have foo aligned such that there are at
least N bytes free before it.

So what kCFI ends up having to do is align the __cfi symbol to the
function alignment, and then stuff enough nops in to make the real
symbol meet alignment.

And the end result is utter insanity.

I mean, look at this:

      50:       2e e9 00 00 00 00       cs jmp 56 <__traceiter_initcall_level+0x46>     52: R_X86_64_PLT32      __x86_return_thunk-0x4
      56:       66 2e 0f 1f 84 00 00 00 00 00   cs nopw 0x0(%rax,%rax,1)

0000000000000060 <__cfi___probestub_initcall_level>:
      60:       90                      nop
      61:       90                      nop
      62:       90                      nop
      63:       90                      nop
      64:       90                      nop
      65:       90                      nop
      66:       90                      nop
      67:       90                      nop
      68:       90                      nop
      69:       90                      nop
      6a:       90                      nop
      6b:       b8 b1 fd 66 f9          mov    $0xf966fdb1,%eax

0000000000000070 <__probestub_initcall_level>:
      70:       2e e9 00 00 00 00       cs jmp 76 <__probestub_initcall_level+0x6>      72: R_X86_64_PLT32      __x86_return_thunk-0x4


That's 21 bytes wasted, for no reason other than that asm doesn't have a
directive to say: get me a place that is M before N alignment.

Because ideally the whole above thing would look like:

      50:       2e e9 00 00 00 00       cs jmp 56 <__traceiter_initcall_level+0x46>     52: R_X86_64_PLT32      __x86_return_thunk-0x4
      56:       66 2e 0f 1f 84 		cs nopw (%rax,%rax,1)

000000000000005b <__cfi___probestub_initcall_level>:
      5b:       b8 b1 fd 66 f9          mov    $0xf966fdb1,%eax

0000000000000060 <__probestub_initcall_level>:
      60:       2e e9 00 00 00 00       cs jmp 76 <__probestub_initcall_level+0x6>      72: R_X86_64_PLT32      __x86_return_thunk-0x4
Menglong Dong March 4, 2025, 7:47 a.m. UTC | #5
On Tue, Mar 4, 2025 at 2:16 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Mar 04, 2025 at 06:38:53AM +0100, Peter Zijlstra wrote:
> > On Tue, Mar 04, 2025 at 09:10:12AM +0800, Menglong Dong wrote:
> > > Hello, sorry that I forgot to add something to the changelog. In fact,
> > > I don't add extra 5-bytes anymore, which you can see in the 3rd patch.
> > >
> > > The thing is that we can't add extra 5-bytes if CFI is enabled. Without
> > > CFI, we can make the padding space any value, such as 5-bytes, and
> > > the layout will be like this:
> > >
> > > __align:
> > >   nop
> > >   nop
> > >   nop
> > >   nop
> > >   nop
> > > foo: -- __align +5
> > >
> > > However, the CFI will always make the cfi insn 16-bytes aligned. When
> > > we set the FUNCTION_PADDING_BYTES to (11 + 5), the layout will be
> > > like this:
> > >
> > > __cfi_foo:
> > >   nop (11)
> > >   mov $0x12345678, %reg
> > >   nop (16)
> > > foo:
> > >
> > > and the padding space is 32-bytes actually. So, we can just select
> > > FUNCTION_ALIGNMENT_32B instead, which makes the padding
> > > space 32-bytes too, and have the following layout:
> > >
> > > __cfi_foo:
> > >   mov $0x12345678, %reg
> > >   nop (27)
> > > foo:
> >
> > *blink*, wtf is clang smoking.
> >
> > I mean, you're right, this is what it is doing, but that is somewhat
> > unexpected. Let me go look at clang source, this is insane.
>
> Bah, this is because assemblers are stupid :/
>
> There is no way to tell them to have foo aligned such that there are at
> least N bytes free before it.
>
> So what kCFI ends up having to do is align the __cfi symbol to the
> function alignment, and then stuff enough nops in to make the real
> symbol meet alignment.
>
> And the end result is utter insanity.
>
> I mean, look at this:
>
>       50:       2e e9 00 00 00 00       cs jmp 56 <__traceiter_initcall_level+0x46>     52: R_X86_64_PLT32      __x86_return_thunk-0x4
>       56:       66 2e 0f 1f 84 00 00 00 00 00   cs nopw 0x0(%rax,%rax,1)
>
> 0000000000000060 <__cfi___probestub_initcall_level>:
>       60:       90                      nop
>       61:       90                      nop
>       62:       90                      nop
>       63:       90                      nop
>       64:       90                      nop
>       65:       90                      nop
>       66:       90                      nop
>       67:       90                      nop
>       68:       90                      nop
>       69:       90                      nop
>       6a:       90                      nop
>       6b:       b8 b1 fd 66 f9          mov    $0xf966fdb1,%eax
>
> 0000000000000070 <__probestub_initcall_level>:
>       70:       2e e9 00 00 00 00       cs jmp 76 <__probestub_initcall_level+0x6>      72: R_X86_64_PLT32      __x86_return_thunk-0x4
>
>
> That's 21 bytes wasted, for no reason other than that asm doesn't have a
> directive to say: get me a place that is M before N alignment.
>
> Because ideally the whole above thing would look like:
>
>       50:       2e e9 00 00 00 00       cs jmp 56 <__traceiter_initcall_level+0x46>     52: R_X86_64_PLT32      __x86_return_thunk-0x4
>       56:       66 2e 0f 1f 84          cs nopw (%rax,%rax,1)
>
> 000000000000005b <__cfi___probestub_initcall_level>:
>       5b:       b8 b1 fd 66 f9          mov    $0xf966fdb1,%eax
>
> 0000000000000060 <__probestub_initcall_level>:
>       60:       2e e9 00 00 00 00       cs jmp 76 <__probestub_initcall_level+0x6>      72: R_X86_64_PLT32      __x86_return_thunk-0x4

Hi, peter. Thank you for the testing, which is quite helpful
to understand the whole thing.

I was surprised at this too. Without CALL_PADDING, the cfi is
nop(11) + mov; with CALL_PADDING, the cfi is mov + nop(11),
which is weird, as it seems that we can select CALL_PADDING if
CFI_CLANG to make things consistent. And I  thought that it is
designed to be this for some reasons :/

Hmm......so what should we do now? Accept and bear it,
or do something different?

I'm good at clang, so the solution that I can think of is how to
bear it :/

According to my testing, the text size will increase:

~2.2% if we make FUNCTION_PADDING_BYTES 27 and select
FUNCTION_ALIGNMENT_16B.

~3.5% if we make FUNCTION_PADDING_BYTES 27 and select
FUNCTION_ALIGNMENT_32B.

We don't have to select FUNCTION_ALIGNMENT_32B, so the
worst case is to increase ~2.2%.

What do you think?

Thanks!
Menglong Dong

>
>
>
Menglong Dong March 4, 2025, 8:41 a.m. UTC | #6
On Tue, Mar 4, 2025 at 3:47 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> On Tue, Mar 4, 2025 at 2:16 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Tue, Mar 04, 2025 at 06:38:53AM +0100, Peter Zijlstra wrote:
> > > On Tue, Mar 04, 2025 at 09:10:12AM +0800, Menglong Dong wrote:
> > > > Hello, sorry that I forgot to add something to the changelog. In fact,
> > > > I don't add extra 5-bytes anymore, which you can see in the 3rd patch.
> > > >
> > > > The thing is that we can't add extra 5-bytes if CFI is enabled. Without
> > > > CFI, we can make the padding space any value, such as 5-bytes, and
> > > > the layout will be like this:
> > > >
> > > > __align:
> > > >   nop
> > > >   nop
> > > >   nop
> > > >   nop
> > > >   nop
> > > > foo: -- __align +5
> > > >
> > > > However, the CFI will always make the cfi insn 16-bytes aligned. When
> > > > we set the FUNCTION_PADDING_BYTES to (11 + 5), the layout will be
> > > > like this:
> > > >
> > > > __cfi_foo:
> > > >   nop (11)
> > > >   mov $0x12345678, %reg
> > > >   nop (16)
> > > > foo:
> > > >
> > > > and the padding space is 32-bytes actually. So, we can just select
> > > > FUNCTION_ALIGNMENT_32B instead, which makes the padding
> > > > space 32-bytes too, and have the following layout:
> > > >
> > > > __cfi_foo:
> > > >   mov $0x12345678, %reg
> > > >   nop (27)
> > > > foo:
> > >
> > > *blink*, wtf is clang smoking.
> > >
> > > I mean, you're right, this is what it is doing, but that is somewhat
> > > unexpected. Let me go look at clang source, this is insane.
> >
> > Bah, this is because assemblers are stupid :/
> >
> > There is no way to tell them to have foo aligned such that there are at
> > least N bytes free before it.
> >
> > So what kCFI ends up having to do is align the __cfi symbol to the
> > function alignment, and then stuff enough nops in to make the real
> > symbol meet alignment.
> >
> > And the end result is utter insanity.
> >
> > I mean, look at this:
> >
> >       50:       2e e9 00 00 00 00       cs jmp 56 <__traceiter_initcall_level+0x46>     52: R_X86_64_PLT32      __x86_return_thunk-0x4
> >       56:       66 2e 0f 1f 84 00 00 00 00 00   cs nopw 0x0(%rax,%rax,1)
> >
> > 0000000000000060 <__cfi___probestub_initcall_level>:
> >       60:       90                      nop
> >       61:       90                      nop
> >       62:       90                      nop
> >       63:       90                      nop
> >       64:       90                      nop
> >       65:       90                      nop
> >       66:       90                      nop
> >       67:       90                      nop
> >       68:       90                      nop
> >       69:       90                      nop
> >       6a:       90                      nop
> >       6b:       b8 b1 fd 66 f9          mov    $0xf966fdb1,%eax
> >
> > 0000000000000070 <__probestub_initcall_level>:
> >       70:       2e e9 00 00 00 00       cs jmp 76 <__probestub_initcall_level+0x6>      72: R_X86_64_PLT32      __x86_return_thunk-0x4
> >
> >
> > That's 21 bytes wasted, for no reason other than that asm doesn't have a
> > directive to say: get me a place that is M before N alignment.
> >
> > Because ideally the whole above thing would look like:
> >
> >       50:       2e e9 00 00 00 00       cs jmp 56 <__traceiter_initcall_level+0x46>     52: R_X86_64_PLT32      __x86_return_thunk-0x4
> >       56:       66 2e 0f 1f 84          cs nopw (%rax,%rax,1)
> >
> > 000000000000005b <__cfi___probestub_initcall_level>:
> >       5b:       b8 b1 fd 66 f9          mov    $0xf966fdb1,%eax
> >
> > 0000000000000060 <__probestub_initcall_level>:
> >       60:       2e e9 00 00 00 00       cs jmp 76 <__probestub_initcall_level+0x6>      72: R_X86_64_PLT32      __x86_return_thunk-0x4
>
> Hi, peter. Thank you for the testing, which is quite helpful
> to understand the whole thing.
>
> I was surprised at this too. Without CALL_PADDING, the cfi is
> nop(11) + mov; with CALL_PADDING, the cfi is mov + nop(11),
> which is weird, as it seems that we can select CALL_PADDING if
> CFI_CLANG to make things consistent. And I  thought that it is
> designed to be this for some reasons :/
>
> Hmm......so what should we do now? Accept and bear it,
> or do something different?
>
> I'm good at clang, so the solution that I can think of is how to

*not good at*

> bear it :/
>
> According to my testing, the text size will increase:
>
> ~2.2% if we make FUNCTION_PADDING_BYTES 27 and select
> FUNCTION_ALIGNMENT_16B.
>
> ~3.5% if we make FUNCTION_PADDING_BYTES 27 and select
> FUNCTION_ALIGNMENT_32B.
>
> We don't have to select FUNCTION_ALIGNMENT_32B, so the
> worst case is to increase ~2.2%.
>
> What do you think?
>
> Thanks!
> Menglong Dong
>
> >
> >
> >
Peter Zijlstra March 4, 2025, 9:42 a.m. UTC | #7
On Tue, Mar 04, 2025 at 03:47:45PM +0800, Menglong Dong wrote:
> We don't have to select FUNCTION_ALIGNMENT_32B, so the
> worst case is to increase ~2.2%.
> 
> What do you think?

Well, since I don't understand what you need this for at all, I'm firmly
on the side of not doing this.

What actual problem is being solved with this meta data nonsense? Why is
it worth blowing up our I$ footprint over.

Also note, that if you're going to be explaining this, start from
scratch, as I have absolutely 0 clues about BPF and such.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/cfi.h b/arch/x86/include/asm/cfi.h
index 2f6a01f098b5..04525f2f6bf2 100644
--- a/arch/x86/include/asm/cfi.h
+++ b/arch/x86/include/asm/cfi.h
@@ -108,6 +108,14 @@  extern bhi_thunk __bhi_args_end[];
 
 struct pt_regs;
 
+#ifdef CONFIG_CALL_PADDING
+#define FINEIBT_INSN_OFFSET	16
+#define CFI_INSN_OFFSET		CONFIG_FUNCTION_ALIGNMENT
+#else
+#define FINEIBT_INSN_OFFSET	0
+#define CFI_INSN_OFFSET		5
+#endif
+
 #ifdef CONFIG_CFI_CLANG
 enum bug_trap_type handle_cfi_failure(struct pt_regs *regs);
 #define __bpfcall
@@ -118,11 +126,8 @@  static inline int cfi_get_offset(void)
 {
 	switch (cfi_mode) {
 	case CFI_FINEIBT:
-		return 16;
 	case CFI_KCFI:
-		if (IS_ENABLED(CONFIG_CALL_PADDING))
-			return 16;
-		return 5;
+		return CFI_INSN_OFFSET;
 	default:
 		return 0;
 	}
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 32e4b801db99..0088d2313f33 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -917,7 +917,7 @@  void __init_or_module noinline apply_seal_endbr(s32 *start, s32 *end)
 
 		poison_endbr(addr);
 		if (IS_ENABLED(CONFIG_FINEIBT))
-			poison_cfi(addr - 16);
+			poison_cfi(addr);
 	}
 }
 
@@ -980,12 +980,13 @@  u32 cfi_get_func_hash(void *func)
 {
 	u32 hash;
 
-	func -= cfi_get_offset();
 	switch (cfi_mode) {
 	case CFI_FINEIBT:
+		func -= FINEIBT_INSN_OFFSET;
 		func += 7;
 		break;
 	case CFI_KCFI:
+		func -= CFI_INSN_OFFSET;
 		func += 1;
 		break;
 	default:
@@ -1372,7 +1373,7 @@  static int cfi_rewrite_preamble(s32 *start, s32 *end)
 		 * have determined there are no indirect calls to it and we
 		 * don't need no CFI either.
 		 */
-		if (!is_endbr(addr + 16))
+		if (!is_endbr(addr + CFI_INSN_OFFSET))
 			continue;
 
 		hash = decode_preamble_hash(addr, &arity);
@@ -1380,6 +1381,7 @@  static int cfi_rewrite_preamble(s32 *start, s32 *end)
 			 addr, addr, 5, addr))
 			return -EINVAL;
 
+		addr += (CFI_INSN_OFFSET - FINEIBT_INSN_OFFSET);
 		text_poke_early(addr, fineibt_preamble_start, fineibt_preamble_size);
 		WARN_ON(*(u32 *)(addr + fineibt_preamble_hash) != 0x12345678);
 		text_poke_early(addr + fineibt_preamble_hash, &hash, 4);
@@ -1402,10 +1404,10 @@  static void cfi_rewrite_endbr(s32 *start, s32 *end)
 	for (s = start; s < end; s++) {
 		void *addr = (void *)s + *s;
 
-		if (!exact_endbr(addr + 16))
+		if (!exact_endbr(addr + CFI_INSN_OFFSET))
 			continue;
 
-		poison_endbr(addr + 16);
+		poison_endbr(addr + CFI_INSN_OFFSET);
 	}
 }
 
@@ -1543,12 +1545,12 @@  static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
 		return;
 
 	case CFI_FINEIBT:
-		/* place the FineIBT preamble at func()-16 */
+		/* place the FineIBT preamble at func()-FINEIBT_INSN_OFFSET */
 		ret = cfi_rewrite_preamble(start_cfi, end_cfi);
 		if (ret)
 			goto err;
 
-		/* rewrite the callers to target func()-16 */
+		/* rewrite the callers to target func()-FINEIBT_INSN_OFFSET */
 		ret = cfi_rewrite_callers(start_retpoline, end_retpoline);
 		if (ret)
 			goto err;
@@ -1588,6 +1590,7 @@  static void poison_cfi(void *addr)
 	 */
 	switch (cfi_mode) {
 	case CFI_FINEIBT:
+		addr -= FINEIBT_INSN_OFFSET;
 		/*
 		 * FineIBT prefix should start with an ENDBR.
 		 */
@@ -1607,6 +1610,7 @@  static void poison_cfi(void *addr)
 		break;
 
 	case CFI_KCFI:
+		addr -= CFI_INSN_OFFSET;
 		/*
 		 * kCFI prefix should start with a valid hash.
 		 */
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 72776dcb75aa..ee86a5df5ffb 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -415,6 +415,12 @@  static int emit_call(u8 **prog, void *func, void *ip);
 static void emit_fineibt(u8 **pprog, u8 *ip, u32 hash, int arity)
 {
 	u8 *prog = *pprog;
+#ifdef CONFIG_CALL_PADDING
+	int i;
+
+	for (i = 0; i < CFI_INSN_OFFSET - 16; i++)
+		EMIT1(0x90);
+#endif
 
 	EMIT_ENDBR();
 	EMIT3_off32(0x41, 0x81, 0xea, hash);		/* subl $hash, %r10d	*/
@@ -432,20 +438,14 @@  static void emit_fineibt(u8 **pprog, u8 *ip, u32 hash, int arity)
 static void emit_kcfi(u8 **pprog, u32 hash)
 {
 	u8 *prog = *pprog;
+#ifdef CONFIG_CALL_PADDING
+	int i;
+#endif
 
 	EMIT1_off32(0xb8, hash);			/* movl $hash, %eax	*/
 #ifdef CONFIG_CALL_PADDING
-	EMIT1(0x90);
-	EMIT1(0x90);
-	EMIT1(0x90);
-	EMIT1(0x90);
-	EMIT1(0x90);
-	EMIT1(0x90);
-	EMIT1(0x90);
-	EMIT1(0x90);
-	EMIT1(0x90);
-	EMIT1(0x90);
-	EMIT1(0x90);
+	for (i = 0; i < CFI_INSN_OFFSET - 5; i++)
+		EMIT1(0x90);
 #endif
 	EMIT_ENDBR();