diff mbox series

[RFC,bpf-next] bpf: Make actual max tail call count as MAX_TAIL_CALL_CNT

Message ID 1631089206-5931-1-git-send-email-yangtiezhu@loongson.cn (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series [RFC,bpf-next] bpf: Make actual max tail call count as MAX_TAIL_CALL_CNT | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers success CCed 33 of 33 maintainers
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 10 this patch: 10
netdev/kdoc success Errors and warnings before: 4 this patch: 4
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 156 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 10 this patch: 10
netdev/header_inline success Link
bpf/vmtest-bpf-next-PR pending PR summary
bpf/vmtest-bpf-next pending VM_Test

Commit Message

Tiezhu Yang Sept. 8, 2021, 8:20 a.m. UTC
In the current code, the actual max tail call count is 33 which is greater
than MAX_TAIL_CALL_CNT, this is not consistent with the intended meaning
in the commit 04fd61ab36ec ("bpf: allow bpf programs to tail-call other
bpf programs"):

"The chain of tail calls can form unpredictable dynamic loops therefore
tail_call_cnt is used to limit the number of calls and currently is set
to 32."

Additionally, after commit 874be05f525e ("bpf, tests: Add tail call test
suite"), we can see there exists failed testcase.

On all archs when CONFIG_BPF_JIT_ALWAYS_ON is not set:
 # echo 0 > /proc/sys/net/core/bpf_jit_enable
 # modprobe test_bpf
 # dmesg | grep -w FAIL
 Tail call error path, max count reached jited:0 ret 34 != 33 FAIL

On some archs:
 # echo 1 > /proc/sys/net/core/bpf_jit_enable
 # modprobe test_bpf
 # dmesg | grep -w FAIL
 Tail call error path, max count reached jited:1 ret 34 != 33 FAIL

with this patch, make the actual max tail call count as MAX_TAIL_CALL_CNT,
at the same time, the above failed testcase can be fixed.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---

Hi all,

This is a RFC patch, if I am wrong or I missed something,
please let me know, thank you!

 arch/arm/net/bpf_jit_32.c         | 11 ++++++-----
 arch/arm64/net/bpf_jit_comp.c     |  7 ++++---
 arch/mips/net/ebpf_jit.c          |  4 ++--
 arch/powerpc/net/bpf_jit_comp32.c |  4 ++--
 arch/powerpc/net/bpf_jit_comp64.c | 12 ++++++------
 arch/riscv/net/bpf_jit_comp32.c   |  4 ++--
 arch/riscv/net/bpf_jit_comp64.c   |  4 ++--
 arch/sparc/net/bpf_jit_comp_64.c  |  8 ++++----
 kernel/bpf/core.c                 |  4 ++--
 9 files changed, 30 insertions(+), 28 deletions(-)

Comments

Daniel Borkmann Sept. 8, 2021, 8:47 a.m. UTC | #1
[ You have a huge Cc list, but forgot to add Paul and Johan who recently
   looked into this. Added here. ]

On 9/8/21 10:20 AM, Tiezhu Yang wrote:
> In the current code, the actual max tail call count is 33 which is greater
> than MAX_TAIL_CALL_CNT, this is not consistent with the intended meaning
> in the commit 04fd61ab36ec ("bpf: allow bpf programs to tail-call other
> bpf programs"):
> 
> "The chain of tail calls can form unpredictable dynamic loops therefore
> tail_call_cnt is used to limit the number of calls and currently is set
> to 32."
> 
> Additionally, after commit 874be05f525e ("bpf, tests: Add tail call test
> suite"), we can see there exists failed testcase.
> 
> On all archs when CONFIG_BPF_JIT_ALWAYS_ON is not set:
>   # echo 0 > /proc/sys/net/core/bpf_jit_enable
>   # modprobe test_bpf
>   # dmesg | grep -w FAIL
>   Tail call error path, max count reached jited:0 ret 34 != 33 FAIL
> 
> On some archs:
>   # echo 1 > /proc/sys/net/core/bpf_jit_enable
>   # modprobe test_bpf
>   # dmesg | grep -w FAIL
>   Tail call error path, max count reached jited:1 ret 34 != 33 FAIL
> 
> with this patch, make the actual max tail call count as MAX_TAIL_CALL_CNT,
> at the same time, the above failed testcase can be fixed.
> 
> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> ---
> 
> Hi all,
> 
> This is a RFC patch, if I am wrong or I missed something,
> please let me know, thank you!

Yes, the original commit from 04fd61ab36ec ("bpf: allow bpf programs to tail-call
other bpf programs") got the counting wrong, but please also check f9dabe016b63
("bpf: Undo off-by-one in interpreter tail call count limit") where we agreed to
align everything to 33 in order to avoid changing existing behavior, and if we
intend to ever change the count, then only in terms of increasing but not decreasing
since that ship has sailed. Tiezhu, do you still see any arch that is not on 33
from your testing? Last time Paul fixed the remaining ones in 96bc4432f5ad ("bpf,
riscv: Limit to 33 tail calls") and e49e6f6db04e ("bpf, mips: Limit to 33 tail calls").

Thanks,
Daniel
Tiezhu Yang Sept. 8, 2021, 10:56 a.m. UTC | #2
On 09/08/2021 04:47 PM, Daniel Borkmann wrote:
> [ You have a huge Cc list, but forgot to add Paul and Johan who recently
>   looked into this. Added here. ]
>
> On 9/8/21 10:20 AM, Tiezhu Yang wrote:
>> In the current code, the actual max tail call count is 33 which is 
>> greater
>> than MAX_TAIL_CALL_CNT, this is not consistent with the intended meaning
>> in the commit 04fd61ab36ec ("bpf: allow bpf programs to tail-call other
>> bpf programs"):
>>
>> "The chain of tail calls can form unpredictable dynamic loops therefore
>> tail_call_cnt is used to limit the number of calls and currently is set
>> to 32."
>>
>> Additionally, after commit 874be05f525e ("bpf, tests: Add tail call test
>> suite"), we can see there exists failed testcase.
>>
>> On all archs when CONFIG_BPF_JIT_ALWAYS_ON is not set:
>>   # echo 0 > /proc/sys/net/core/bpf_jit_enable
>>   # modprobe test_bpf
>>   # dmesg | grep -w FAIL
>>   Tail call error path, max count reached jited:0 ret 34 != 33 FAIL
>>
>> On some archs:
>>   # echo 1 > /proc/sys/net/core/bpf_jit_enable
>>   # modprobe test_bpf
>>   # dmesg | grep -w FAIL
>>   Tail call error path, max count reached jited:1 ret 34 != 33 FAIL
>>
>> with this patch, make the actual max tail call count as 
>> MAX_TAIL_CALL_CNT,
>> at the same time, the above failed testcase can be fixed.
>>
>> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
>> ---
>>
>> Hi all,
>>
>> This is a RFC patch, if I am wrong or I missed something,
>> please let me know, thank you!
>
> Yes, the original commit from 04fd61ab36ec ("bpf: allow bpf programs 
> to tail-call
> other bpf programs") got the counting wrong, but please also check 
> f9dabe016b63
> ("bpf: Undo off-by-one in interpreter tail call count limit") where we 
> agreed to
> align everything to 33 in order to avoid changing existing behavior, 
> and if we
> intend to ever change the count, then only in terms of increasing but 
> not decreasing
> since that ship has sailed.

Thank you, understood.

But I still think there is some confusion about the macro MAX_TAIL_CALL_CNT
which is 32 and the actual value 33, I spent some time to understand it
at the first glance.

Is it impossible to keep the actual max tail call count consistent with
the value 32 of MAX_TAIL_CALL_CNT now?

At least, maybe we need to modify the testcase?

> Tiezhu, do you still see any arch that is not on 33
> from your testing?

If the testcase "Tail call error path, max count reached" in test_bpf is 
right,
it seems that the tail call count limit is 32 on x86, because the testcase
passed on x86 jited.

> Last time Paul fixed the remaining ones in 96bc4432f5ad ("bpf,
> riscv: Limit to 33 tail calls") and e49e6f6db04e ("bpf, mips: Limit to 
> 33 tail calls").
>
> Thanks,
> Daniel
Johan Almbladh Sept. 8, 2021, 11:36 a.m. UTC | #3
On Wed, Sep 8, 2021 at 12:56 PM Tiezhu Yang <yangtiezhu@loongson.cn> wrote:
>
> On 09/08/2021 04:47 PM, Daniel Borkmann wrote:
> > [ You have a huge Cc list, but forgot to add Paul and Johan who recently
> >   looked into this. Added here. ]
> >
> > On 9/8/21 10:20 AM, Tiezhu Yang wrote:
> >> In the current code, the actual max tail call count is 33 which is
> >> greater
> >> than MAX_TAIL_CALL_CNT, this is not consistent with the intended meaning
> >> in the commit 04fd61ab36ec ("bpf: allow bpf programs to tail-call other
> >> bpf programs"):
> >>
> >> "The chain of tail calls can form unpredictable dynamic loops therefore
> >> tail_call_cnt is used to limit the number of calls and currently is set
> >> to 32."
> >>
> >> Additionally, after commit 874be05f525e ("bpf, tests: Add tail call test
> >> suite"), we can see there exists failed testcase.
> >>
> >> On all archs when CONFIG_BPF_JIT_ALWAYS_ON is not set:
> >>   # echo 0 > /proc/sys/net/core/bpf_jit_enable
> >>   # modprobe test_bpf
> >>   # dmesg | grep -w FAIL
> >>   Tail call error path, max count reached jited:0 ret 34 != 33 FAIL
> >>
> >> On some archs:
> >>   # echo 1 > /proc/sys/net/core/bpf_jit_enable
> >>   # modprobe test_bpf
> >>   # dmesg | grep -w FAIL
> >>   Tail call error path, max count reached jited:1 ret 34 != 33 FAIL
> >>
> >> with this patch, make the actual max tail call count as
> >> MAX_TAIL_CALL_CNT,
> >> at the same time, the above failed testcase can be fixed.
> >>
> >> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> >> ---
> >>
> >> Hi all,
> >>
> >> This is a RFC patch, if I am wrong or I missed something,
> >> please let me know, thank you!
> >
> > Yes, the original commit from 04fd61ab36ec ("bpf: allow bpf programs
> > to tail-call
> > other bpf programs") got the counting wrong, but please also check
> > f9dabe016b63
> > ("bpf: Undo off-by-one in interpreter tail call count limit") where we
> > agreed to
> > align everything to 33 in order to avoid changing existing behavior,
> > and if we
> > intend to ever change the count, then only in terms of increasing but
> > not decreasing
> > since that ship has sailed.
>
> Thank you, understood.
>
> But I still think there is some confusion about the macro MAX_TAIL_CALL_CNT
> which is 32 and the actual value 33, I spent some time to understand it
> at the first glance.
>
> Is it impossible to keep the actual max tail call count consistent with
> the value 32 of MAX_TAIL_CALL_CNT now?

Yes. If the limit is 32 or 33 does not really matter, but there has to
be a limit. Since the actual limit has been 33, we don't want to break
any user space program relying on this value.

> At least, maybe we need to modify the testcase?

Before making any changes in the test or the BPF code, we need to
understand what the current behaviour actually is. Then we can make
the necessary changes to make everything consistent with 33.

> > Tiezhu, do you still see any arch that is not on 33
> > from your testing?
>
> If the testcase "Tail call error path, max count reached" in test_bpf is
> right,
> it seems that the tail call count limit is 32 on x86, because the testcase
> passed on x86 jited.

When I run the test_bpf.ko suite I get the following limits.

Interpreter: 33
JIT for arm{32,64}, mips, s390x, powerpc{32,64}, sparc: 33
JIT for x86-{32,64}: 32

However, there are also tail call tests in the selftests suite.
According to Paul those tests show that everything is consistent with
33 now. So, there seem to be a discrepancy between the test_bpf.ko
tests and the selftests.

I am trying to investigate this matter further, but so far I have only
been able to run the test_bpf.ko module for various archs in QEMU.
Link: https://github.com/almbladh/test-linux

I am currently working on getting the selftests to run in QEMU too,
using Buildroot. If you have a working setup for this, it would be
great if you could share that.

Thanks,
Johan

>
> > Last time Paul fixed the remaining ones in 96bc4432f5ad ("bpf,
> > riscv: Limit to 33 tail calls") and e49e6f6db04e ("bpf, mips: Limit to
> > 33 tail calls").
> >
> > Thanks,
> > Daniel
>
Tiezhu Yang Sept. 8, 2021, 11:57 a.m. UTC | #4
On 9/8/21 7:36 PM, Johan Almbladh wrote:
> On Wed, Sep 8, 2021 at 12:56 PM Tiezhu Yang <yangtiezhu@loongson.cn> wrote:
>> On 09/08/2021 04:47 PM, Daniel Borkmann wrote:
>>> [ You have a huge Cc list, but forgot to add Paul and Johan who recently
>>>    looked into this. Added here. ]
>>>
>>> On 9/8/21 10:20 AM, Tiezhu Yang wrote:
>>>> In the current code, the actual max tail call count is 33 which is
>>>> greater
>>>> than MAX_TAIL_CALL_CNT, this is not consistent with the intended meaning
>>>> in the commit 04fd61ab36ec ("bpf: allow bpf programs to tail-call other
>>>> bpf programs"):
>>>>
>>>> "The chain of tail calls can form unpredictable dynamic loops therefore
>>>> tail_call_cnt is used to limit the number of calls and currently is set
>>>> to 32."
>>>>
>>>> Additionally, after commit 874be05f525e ("bpf, tests: Add tail call test
>>>> suite"), we can see there exists failed testcase.
>>>>
>>>> On all archs when CONFIG_BPF_JIT_ALWAYS_ON is not set:
>>>>    # echo 0 > /proc/sys/net/core/bpf_jit_enable
>>>>    # modprobe test_bpf
>>>>    # dmesg | grep -w FAIL
>>>>    Tail call error path, max count reached jited:0 ret 34 != 33 FAIL
>>>>
>>>> On some archs:
>>>>    # echo 1 > /proc/sys/net/core/bpf_jit_enable
>>>>    # modprobe test_bpf
>>>>    # dmesg | grep -w FAIL
>>>>    Tail call error path, max count reached jited:1 ret 34 != 33 FAIL
>>>>
>>>> with this patch, make the actual max tail call count as
>>>> MAX_TAIL_CALL_CNT,
>>>> at the same time, the above failed testcase can be fixed.
>>>>
>>>> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
>>>> ---
>>>>
>>>> Hi all,
>>>>
>>>> This is a RFC patch, if I am wrong or I missed something,
>>>> please let me know, thank you!
>>> Yes, the original commit from 04fd61ab36ec ("bpf: allow bpf programs
>>> to tail-call
>>> other bpf programs") got the counting wrong, but please also check
>>> f9dabe016b63
>>> ("bpf: Undo off-by-one in interpreter tail call count limit") where we
>>> agreed to
>>> align everything to 33 in order to avoid changing existing behavior,
>>> and if we
>>> intend to ever change the count, then only in terms of increasing but
>>> not decreasing
>>> since that ship has sailed.
>> Thank you, understood.
>>
>> But I still think there is some confusion about the macro MAX_TAIL_CALL_CNT
>> which is 32 and the actual value 33, I spent some time to understand it
>> at the first glance.
>>
>> Is it impossible to keep the actual max tail call count consistent with
>> the value 32 of MAX_TAIL_CALL_CNT now?
> Yes. If the limit is 32 or 33 does not really matter, but there has to
> be a limit. Since the actual limit has been 33, we don't want to break
> any user space program relying on this value.


OK, I see, it is very clear now, thank you.

Since the actual limit is 33 now, can we change the value of
MAX_TAIL_CALL_CNT from 32 to 33 and then do some small changes
of the related code?

It will not change the current status but make the actual max
tail call count consistent with the value of MAX_TAIL_CALL_CNT.

If it is OK, I will send a patch later.

Thanks,
Tiezhu


>
>> At least, maybe we need to modify the testcase?
> Before making any changes in the test or the BPF code, we need to
> understand what the current behaviour actually is. Then we can make
> the necessary changes to make everything consistent with 33.
>
>>> Tiezhu, do you still see any arch that is not on 33
>>> from your testing?
>> If the testcase "Tail call error path, max count reached" in test_bpf is
>> right,
>> it seems that the tail call count limit is 32 on x86, because the testcase
>> passed on x86 jited.
> When I run the test_bpf.ko suite I get the following limits.
>
> Interpreter: 33
> JIT for arm{32,64}, mips, s390x, powerpc{32,64}, sparc: 33
> JIT for x86-{32,64}: 32
>
> However, there are also tail call tests in the selftests suite.
> According to Paul those tests show that everything is consistent with
> 33 now. So, there seem to be a discrepancy between the test_bpf.ko
> tests and the selftests.
>
> I am trying to investigate this matter further, but so far I have only
> been able to run the test_bpf.ko module for various archs in QEMU.
> Link: https://github.com/almbladh/test-linux
>
> I am currently working on getting the selftests to run in QEMU too,
> using Buildroot. If you have a working setup for this, it would be
> great if you could share that.
>
> Thanks,
> Johan
>
>>> Last time Paul fixed the remaining ones in 96bc4432f5ad ("bpf,
>>> riscv: Limit to 33 tail calls") and e49e6f6db04e ("bpf, mips: Limit to
>>> 33 tail calls").
>>>
>>> Thanks,
>>> Daniel
diff mbox series

Patch

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index a951276..39d9ae9 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -1180,18 +1180,19 @@  static int emit_bpf_tail_call(struct jit_ctx *ctx)
 
 	/* tmp2[0] = array, tmp2[1] = index */
 
-	/* if (tail_call_cnt > MAX_TAIL_CALL_CNT)
-	 *	goto out;
+	/*
 	 * tail_call_cnt++;
+	 * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
+	 *	goto out;
 	 */
+	tc = arm_bpf_get_reg64(tcc, tmp, ctx);
+	emit(ARM_ADDS_I(tc[1], tc[1], 1), ctx);
+	emit(ARM_ADC_I(tc[0], tc[0], 0), ctx);
 	lo = (u32)MAX_TAIL_CALL_CNT;
 	hi = (u32)((u64)MAX_TAIL_CALL_CNT >> 32);
-	tc = arm_bpf_get_reg64(tcc, tmp, ctx);
 	emit(ARM_CMP_I(tc[0], hi), ctx);
 	_emit(ARM_COND_EQ, ARM_CMP_I(tc[1], lo), ctx);
 	_emit(ARM_COND_HI, ARM_B(jmp_offset), ctx);
-	emit(ARM_ADDS_I(tc[1], tc[1], 1), ctx);
-	emit(ARM_ADC_I(tc[0], tc[0], 0), ctx);
 	arm_bpf_put_reg64(tcc, tmp, ctx);
 
 	/* prog = array->ptrs[index]
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 41c23f4..5d6c843 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -286,14 +286,15 @@  static int emit_bpf_tail_call(struct jit_ctx *ctx)
 	emit(A64_CMP(0, r3, tmp), ctx);
 	emit(A64_B_(A64_COND_CS, jmp_offset), ctx);
 
-	/* if (tail_call_cnt > MAX_TAIL_CALL_CNT)
-	 *     goto out;
+	/*
 	 * tail_call_cnt++;
+	 * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
+	 *     goto out;
 	 */
+	emit(A64_ADD_I(1, tcc, tcc, 1), ctx);
 	emit_a64_mov_i64(tmp, MAX_TAIL_CALL_CNT, ctx);
 	emit(A64_CMP(1, tcc, tmp), ctx);
 	emit(A64_B_(A64_COND_HI, jmp_offset), ctx);
-	emit(A64_ADD_I(1, tcc, tcc, 1), ctx);
 
 	/* prog = array->ptrs[index];
 	 * if (prog == NULL)
diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index 3a73e93..029fc34 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -617,14 +617,14 @@  static int emit_bpf_tail_call(struct jit_ctx *ctx, int this_idx)
 	b_off = b_imm(this_idx + 1, ctx);
 	emit_instr(ctx, bne, MIPS_R_AT, MIPS_R_ZERO, b_off);
 	/*
-	 * if (TCC-- < 0)
+	 * if (--TCC < 0)
 	 *     goto out;
 	 */
 	/* Delay slot */
 	tcc_reg = (ctx->flags & EBPF_TCC_IN_V1) ? MIPS_R_V1 : MIPS_R_S4;
 	emit_instr(ctx, daddiu, MIPS_R_T5, tcc_reg, -1);
 	b_off = b_imm(this_idx + 1, ctx);
-	emit_instr(ctx, bltz, tcc_reg, b_off);
+	emit_instr(ctx, bltz, MIPS_R_T5, b_off);
 	/*
 	 * prog = array->ptrs[index];
 	 * if (prog == NULL)
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index beb12cb..b5585ad 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -221,12 +221,12 @@  static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
 	PPC_BCC(COND_GE, out);
 
 	/*
+	 * tail_call_cnt++;
 	 * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
 	 *   goto out;
 	 */
-	EMIT(PPC_RAW_CMPLWI(_R0, MAX_TAIL_CALL_CNT));
-	/* tail_call_cnt++; */
 	EMIT(PPC_RAW_ADDIC(_R0, _R0, 1));
+	EMIT(PPC_RAW_CMPLWI(_R0, MAX_TAIL_CALL_CNT));
 	PPC_BCC(COND_GT, out);
 
 	/* prog = array->ptrs[index]; */
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index b87a63d..bb15cc4 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -227,6 +227,12 @@  static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
 	PPC_BCC(COND_GE, out);
 
 	/*
+	 * tail_call_cnt++;
+	 */
+	EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], b2p[TMP_REG_1], 1));
+	PPC_BPF_STL(b2p[TMP_REG_1], 1, bpf_jit_stack_tailcallcnt(ctx));
+
+	/*
 	 * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
 	 *   goto out;
 	 */
@@ -234,12 +240,6 @@  static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
 	EMIT(PPC_RAW_CMPLWI(b2p[TMP_REG_1], MAX_TAIL_CALL_CNT));
 	PPC_BCC(COND_GT, out);
 
-	/*
-	 * tail_call_cnt++;
-	 */
-	EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], b2p[TMP_REG_1], 1));
-	PPC_BPF_STL(b2p[TMP_REG_1], 1, bpf_jit_stack_tailcallcnt(ctx));
-
 	/* prog = array->ptrs[index]; */
 	EMIT(PPC_RAW_MULI(b2p[TMP_REG_1], b2p_index, 8));
 	EMIT(PPC_RAW_ADD(b2p[TMP_REG_1], b2p[TMP_REG_1], b2p_bpf_array));
diff --git a/arch/riscv/net/bpf_jit_comp32.c b/arch/riscv/net/bpf_jit_comp32.c
index e649742..1608d94 100644
--- a/arch/riscv/net/bpf_jit_comp32.c
+++ b/arch/riscv/net/bpf_jit_comp32.c
@@ -800,12 +800,12 @@  static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
 
 	/*
 	 * temp_tcc = tcc - 1;
-	 * if (tcc < 0)
+	 * if (temp_tcc < 0)
 	 *   goto out;
 	 */
 	emit(rv_addi(RV_REG_T1, RV_REG_TCC, -1), ctx);
 	off = ninsns_rvoff(tc_ninsn - (ctx->ninsns - start_insn));
-	emit_bcc(BPF_JSLT, RV_REG_TCC, RV_REG_ZERO, off, ctx);
+	emit_bcc(BPF_JSLT, RV_REG_T1, RV_REG_ZERO, off, ctx);
 
 	/*
 	 * prog = array->ptrs[index];
diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 3af4131..6e9ba83 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -311,12 +311,12 @@  static int emit_bpf_tail_call(int insn, struct rv_jit_context *ctx)
 	off = ninsns_rvoff(tc_ninsn - (ctx->ninsns - start_insn));
 	emit_branch(BPF_JGE, RV_REG_A2, RV_REG_T1, off, ctx);
 
-	/* if (TCC-- < 0)
+	/* if (--TCC < 0)
 	 *     goto out;
 	 */
 	emit_addi(RV_REG_T1, tcc, -1, ctx);
 	off = ninsns_rvoff(tc_ninsn - (ctx->ninsns - start_insn));
-	emit_branch(BPF_JSLT, tcc, RV_REG_ZERO, off, ctx);
+	emit_branch(BPF_JSLT, RV_REG_T1, RV_REG_ZERO, off, ctx);
 
 	/* prog = array->ptrs[index];
 	 * if (!prog)
diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c
index 9a2f20c..50d914c 100644
--- a/arch/sparc/net/bpf_jit_comp_64.c
+++ b/arch/sparc/net/bpf_jit_comp_64.c
@@ -863,6 +863,10 @@  static void emit_tail_call(struct jit_ctx *ctx)
 	emit_branch(BGEU, ctx->idx, ctx->idx + OFFSET1, ctx);
 	emit_nop(ctx);
 
+	emit_alu_K(ADD, tmp, 1, ctx);
+	off = BPF_TAILCALL_CNT_SP_OFF;
+	emit(ST32 | IMMED | RS1(SP) | S13(off) | RD(tmp), ctx);
+
 	off = BPF_TAILCALL_CNT_SP_OFF;
 	emit(LD32 | IMMED | RS1(SP) | S13(off) | RD(tmp), ctx);
 	emit_cmpi(tmp, MAX_TAIL_CALL_CNT, ctx);
@@ -870,10 +874,6 @@  static void emit_tail_call(struct jit_ctx *ctx)
 	emit_branch(BGU, ctx->idx, ctx->idx + OFFSET2, ctx);
 	emit_nop(ctx);
 
-	emit_alu_K(ADD, tmp, 1, ctx);
-	off = BPF_TAILCALL_CNT_SP_OFF;
-	emit(ST32 | IMMED | RS1(SP) | S13(off) | RD(tmp), ctx);
-
 	emit_alu3_K(SLL, bpf_index, 3, tmp, ctx);
 	emit_alu(ADD, bpf_array, tmp, ctx);
 	off = offsetof(struct bpf_array, ptrs);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 9f4636d..8edb1c3 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1564,10 +1564,10 @@  static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn)
 
 		if (unlikely(index >= array->map.max_entries))
 			goto out;
-		if (unlikely(tail_call_cnt > MAX_TAIL_CALL_CNT))
-			goto out;
 
 		tail_call_cnt++;
+		if (unlikely(tail_call_cnt > MAX_TAIL_CALL_CNT))
+			goto out;
 
 		prog = READ_ONCE(array->ptrs[index]);
 		if (!prog)