diff mbox series

accel/tcg: Call tcg_tb_insert() for one-insn TBs

Message ID 20250115232022.27332-1-iii@linux.ibm.com (mailing list archive)
State New
Headers show
Series accel/tcg: Call tcg_tb_insert() for one-insn TBs | expand

Commit Message

Ilya Leoshkevich Jan. 15, 2025, 11:20 p.m. UTC
Currently single-insn TBs created from I/O memory are not added to
region_trees. Therefore, when they generate exceptions, they are not
handled by cpu_restore_state_from_tb(). For x86 this is not a problem,
because x86_restore_state_to_opc() only restores pc and cc, which are
already correct. However, on several other architectures,
restore_state_to_opc() restores more registers, and guests can notice
incorrect values.

Fix by always calling tcg_tb_insert(). This may increase the size of
region_trees, but tcg_region_reset_all() clears it once code_gen_buffer
fills up, so it will not grow uncontrollably.

Co-developed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 accel/tcg/translate-all.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

Comments

Richard Henderson Jan. 16, 2025, 12:08 a.m. UTC | #1
On 1/15/25 15:20, Ilya Leoshkevich wrote:
> Currently single-insn TBs created from I/O memory are not added to
> region_trees. Therefore, when they generate exceptions, they are not
> handled by cpu_restore_state_from_tb(). For x86 this is not a problem,
> because x86_restore_state_to_opc() only restores pc and cc, which are
> already correct. However, on several other architectures,
> restore_state_to_opc() restores more registers, and guests can notice
> incorrect values.
> 
> Fix by always calling tcg_tb_insert(). This may increase the size of
> region_trees, but tcg_region_reset_all() clears it once code_gen_buffer
> fills up, so it will not grow uncontrollably.
> 
> Co-developed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> ---

This needs something else.  The reason why they're not insertted is that they're not valid 
for a second execution.  We need to not find them in the search tree.


r~

>   accel/tcg/translate-all.c | 16 ++++++++--------
>   1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 453eb20ec95..6333302813e 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -531,23 +531,23 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>           tb_reset_jump(tb, 1);
>       }
>   
> +    /*
> +     * Insert TB into the corresponding region tree before publishing it
> +     * through QHT. Otherwise rewinding happened in the TB might fail to
> +     * lookup itself using host PC.
> +     */
> +    tcg_tb_insert(tb);
> +
>       /*
>        * If the TB is not associated with a physical RAM page then it must be
>        * a temporary one-insn TB, and we have nothing left to do. Return early
> -     * before attempting to link to other TBs or add to the lookup table.
> +     * before attempting to link to other TBs.
>        */
>       if (tb_page_addr0(tb) == -1) {
>           assert_no_pages_locked();
>           return tb;
>       }
>   
> -    /*
> -     * Insert TB into the corresponding region tree before publishing it
> -     * through QHT. Otherwise rewinding happened in the TB might fail to
> -     * lookup itself using host PC.
> -     */
> -    tcg_tb_insert(tb);
> -
>       /*
>        * No explicit memory barrier is required -- tb_link_page() makes the
>        * TB visible in a consistent state.
Ilya Leoshkevich Jan. 16, 2025, 9:19 a.m. UTC | #2
On Wed, 2025-01-15 at 16:08 -0800, Richard Henderson wrote:
> On 1/15/25 15:20, Ilya Leoshkevich wrote:
> > Currently single-insn TBs created from I/O memory are not added to
> > region_trees. Therefore, when they generate exceptions, they are
> > not
> > handled by cpu_restore_state_from_tb(). For x86 this is not a
> > problem,
> > because x86_restore_state_to_opc() only restores pc and cc, which
> > are
> > already correct. However, on several other architectures,
> > restore_state_to_opc() restores more registers, and guests can
> > notice
> > incorrect values.
> > 
> > Fix by always calling tcg_tb_insert(). This may increase the size
> > of
> > region_trees, but tcg_region_reset_all() clears it once
> > code_gen_buffer
> > fills up, so it will not grow uncontrollably.
> > 
> > Co-developed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
> > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> > ---
> 
> This needs something else.  The reason why they're not insertted is
> that they're not valid 
> for a second execution.  We need to not find them in the search tree.

I have the impression that code_gen_buffer is append-only, so after we
create a new TB for the second execution, the first TB should not
be deleted - is this correct? At least I haven't found code_gen_ptr
decrements, besides the rollback at the end of tb_gen_code(). Then,
since region_trees are indexed by code_gen_buffer pointers, and not
guest pointers, this should not introduce any stale entries.

While we might not need to find the ones created for the first
execution, we still need to find the ones for executions that fail -
and there is no way to tell in advance, which ones these are going to
be, so the idea here is to register all of them.

Am I missing something?

> r~
Alex Bennée Jan. 16, 2025, 10:51 a.m. UTC | #3
Ilya Leoshkevich <iii@linux.ibm.com> writes:

> On Wed, 2025-01-15 at 16:08 -0800, Richard Henderson wrote:
>> On 1/15/25 15:20, Ilya Leoshkevich wrote:
>> > Currently single-insn TBs created from I/O memory are not added to
>> > region_trees. Therefore, when they generate exceptions, they are
>> > not
>> > handled by cpu_restore_state_from_tb(). For x86 this is not a
>> > problem,
>> > because x86_restore_state_to_opc() only restores pc and cc, which
>> > are
>> > already correct. However, on several other architectures,
>> > restore_state_to_opc() restores more registers, and guests can
>> > notice
>> > incorrect values.
>> > 
>> > Fix by always calling tcg_tb_insert(). This may increase the size
>> > of
>> > region_trees, but tcg_region_reset_all() clears it once
>> > code_gen_buffer
>> > fills up, so it will not grow uncontrollably.
>> > 
>> > Co-developed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
>> > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
>> > ---
>> 
>> This needs something else.  The reason why they're not insertted is
>> that they're not valid 
>> for a second execution.  We need to not find them in the search tree.
>
> I have the impression that code_gen_buffer is append-only, so after we
> create a new TB for the second execution, the first TB should not
> be deleted - is this correct? At least I haven't found code_gen_ptr
> decrements, besides the rollback at the end of tb_gen_code(). Then,
> since region_trees are indexed by code_gen_buffer pointers, and not
> guest pointers, this should not introduce any stale entries.

We used to generate a temporary TB, execute it and then wind codeptr
back. We simplified the code to generate the TB but then not add it to
QHT - I think the original reasoning was saving on scarce CF_ flags and
not over complicating the tb_gen_code function. See around 873d64ac30
(accel/tcg: re-factor non-RAM execution code).

This does have the effect of potentially regenerating the same TB
over and over again but usually there only a few insns executed out of
IO space.

However some arches do more of this than others. And now we have devices
such as CXL where people do want to run large amounts of code directly
out of io address space. This eventually exhausts the codegen buffer so
we do a tb_flush() and start again. This is obviously sloooooow.

>
> While we might not need to find the ones created for the first
> execution, we still need to find the ones for executions that fail -
> and there is no way to tell in advance, which ones these are going to
> be, so the idea here is to register all of them.
>
> Am I missing something?
>
>> r~
Peter Maydell Jan. 16, 2025, 11:06 a.m. UTC | #4
On Thu, 16 Jan 2025 at 10:52, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Ilya Leoshkevich <iii@linux.ibm.com> writes:
>
> > On Wed, 2025-01-15 at 16:08 -0800, Richard Henderson wrote:
> >> On 1/15/25 15:20, Ilya Leoshkevich wrote:
> >> > Currently single-insn TBs created from I/O memory are not added to
> >> > region_trees. Therefore, when they generate exceptions, they are
> >> > not
> >> > handled by cpu_restore_state_from_tb(). For x86 this is not a
> >> > problem,
> >> > because x86_restore_state_to_opc() only restores pc and cc, which
> >> > are
> >> > already correct. However, on several other architectures,
> >> > restore_state_to_opc() restores more registers, and guests can
> >> > notice
> >> > incorrect values.
> >> >
> >> > Fix by always calling tcg_tb_insert(). This may increase the size
> >> > of
> >> > region_trees, but tcg_region_reset_all() clears it once
> >> > code_gen_buffer
> >> > fills up, so it will not grow uncontrollably.
> >> >
> >> > Co-developed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
> >> > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> >> > ---
> >>
> >> This needs something else.  The reason why they're not insertted is
> >> that they're not valid
> >> for a second execution.  We need to not find them in the search tree.
> >
> > I have the impression that code_gen_buffer is append-only, so after we
> > create a new TB for the second execution, the first TB should not
> > be deleted - is this correct? At least I haven't found code_gen_ptr
> > decrements, besides the rollback at the end of tb_gen_code(). Then,
> > since region_trees are indexed by code_gen_buffer pointers, and not
> > guest pointers, this should not introduce any stale entries.
>
> We used to generate a temporary TB, execute it and then wind codeptr
> back. We simplified the code to generate the TB but then not add it to
> QHT - I think the original reasoning was saving on scarce CF_ flags and
> not over complicating the tb_gen_code function. See around 873d64ac30
> (accel/tcg: re-factor non-RAM execution code).
>
> This does have the effect of potentially regenerating the same TB
> over and over again but usually there only a few insns executed out of
> IO space.

You can't avoid regenerating the TB unless you somehow put the
entire bytes of the instruction into the hash you look up,
because you must re-read the MMIO region, and when you do the
second read you might read a different value than the first
time you read and created the TB.

The original reported problem here seems to me like it's a
problem with whatever target's frontend code this is.
This is a single instruction TB, so either:
 * the generated code for it completes the insn without
   raising an exception (no problem)
 * the generated code for it should raise an exception
   without having modified the CPU state (so there would
   be nothing to do for restore_state_to_opc)

It sounds like the target is generating code which does
something like:
 * do part of the instruction, including updating some of
   the CPU state
 * then decide it needs to raise an exception, and rely on
   the restore_state_to_opc handling to undo the state updates
   it did previously

The assumption of the "throwaway single insn TB" is that
you don't do that (i.e. that restore_state_to_opc is only
there for the benefit of multi-insn TBs).

thanks
-- PMM
Ilya Leoshkevich Jan. 16, 2025, 11:48 a.m. UTC | #5
On Thu, 2025-01-16 at 11:06 +0000, Peter Maydell wrote:
> On Thu, 16 Jan 2025 at 10:52, Alex Bennée <alex.bennee@linaro.org>
> wrote:
> > 
> > Ilya Leoshkevich <iii@linux.ibm.com> writes:
> > 
> > > On Wed, 2025-01-15 at 16:08 -0800, Richard Henderson wrote:
> > > > On 1/15/25 15:20, Ilya Leoshkevich wrote:
> > > > > Currently single-insn TBs created from I/O memory are not
> > > > > added to
> > > > > region_trees. Therefore, when they generate exceptions, they
> > > > > are
> > > > > not
> > > > > handled by cpu_restore_state_from_tb(). For x86 this is not a
> > > > > problem,
> > > > > because x86_restore_state_to_opc() only restores pc and cc,
> > > > > which
> > > > > are
> > > > > already correct. However, on several other architectures,
> > > > > restore_state_to_opc() restores more registers, and guests
> > > > > can
> > > > > notice
> > > > > incorrect values.
> > > > > 
> > > > > Fix by always calling tcg_tb_insert(). This may increase the
> > > > > size
> > > > > of
> > > > > region_trees, but tcg_region_reset_all() clears it once
> > > > > code_gen_buffer
> > > > > fills up, so it will not grow uncontrollably.
> > > > > 
> > > > > Co-developed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
> > > > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> > > > > ---
> > > > 
> > > > This needs something else.  The reason why they're not
> > > > insertted is
> > > > that they're not valid
> > > > for a second execution.  We need to not find them in the search
> > > > tree.
> > > 
> > > I have the impression that code_gen_buffer is append-only, so
> > > after we
> > > create a new TB for the second execution, the first TB should not
> > > be deleted - is this correct? At least I haven't found
> > > code_gen_ptr
> > > decrements, besides the rollback at the end of tb_gen_code().
> > > Then,
> > > since region_trees are indexed by code_gen_buffer pointers, and
> > > not
> > > guest pointers, this should not introduce any stale entries.
> > 
> > We used to generate a temporary TB, execute it and then wind
> > codeptr
> > back. We simplified the code to generate the TB but then not add it
> > to
> > QHT - I think the original reasoning was saving on scarce CF_ flags
> > and
> > not over complicating the tb_gen_code function. See around
> > 873d64ac30
> > (accel/tcg: re-factor non-RAM execution code).
> > 
> > This does have the effect of potentially regenerating the same TB
> > over and over again but usually there only a few insns executed out
> > of
> > IO space.
> 
> You can't avoid regenerating the TB unless you somehow put the
> entire bytes of the instruction into the hash you look up,
> because you must re-read the MMIO region, and when you do the
> second read you might read a different value than the first
> time you read and created the TB.
> 
> The original reported problem here seems to me like it's a
> problem with whatever target's frontend code this is.
> This is a single instruction TB, so either:
>  * the generated code for it completes the insn without
>    raising an exception (no problem)
>  * the generated code for it should raise an exception
>    without having modified the CPU state (so there would
>    be nothing to do for restore_state_to_opc)
> 
> It sounds like the target is generating code which does
> something like:
>  * do part of the instruction, including updating some of
>    the CPU state
>  * then decide it needs to raise an exception, and rely on
>    the restore_state_to_opc handling to undo the state updates
>    it did previously
> 
> The assumption of the "throwaway single insn TB" is that
> you don't do that (i.e. that restore_state_to_opc is only
> there for the benefit of multi-insn TBs).
> 
> thanks
> -- PMM

The problem is not a partial state update in an instruction, but rather
that on some targets restore_state_to_opc is more than just a
"restore" - it is also "prepare for handling an exception", i.e.:

- arm: exception.syndrome
- hppa: unwind_breg, psw_n
- mips: btarget
- openrisc: ppc
- riscv: excp_uw2
- s390x: int_pgm_ilen

Some of these may be wrong due to unfamiliarity with the respective
architectures, sorry - but this illustrates the idea.
Peter Maydell Jan. 16, 2025, 12:24 p.m. UTC | #6
On Thu, 16 Jan 2025 at 11:48, Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>
> On Thu, 2025-01-16 at 11:06 +0000, Peter Maydell wrote:
> > The original reported problem here seems to me like it's a
> > problem with whatever target's frontend code this is.
> > This is a single instruction TB, so either:
> >  * the generated code for it completes the insn without
> >    raising an exception (no problem)
> >  * the generated code for it should raise an exception
> >    without having modified the CPU state (so there would
> >    be nothing to do for restore_state_to_opc)
> >
> > It sounds like the target is generating code which does
> > something like:
> >  * do part of the instruction, including updating some of
> >    the CPU state
> >  * then decide it needs to raise an exception, and rely on
> >    the restore_state_to_opc handling to undo the state updates
> >    it did previously
> >
> > The assumption of the "throwaway single insn TB" is that
> > you don't do that (i.e. that restore_state_to_opc is only
> > there for the benefit of multi-insn TBs).

> The problem is not a partial state update in an instruction, but rather
> that on some targets restore_state_to_opc is more than just a
> "restore" - it is also "prepare for handling an exception", i.e.:
>
> - arm: exception.syndrome
> - hppa: unwind_breg, psw_n
> - mips: btarget
> - openrisc: ppc
> - riscv: excp_uw2
> - s390x: int_pgm_ilen
>
> Some of these may be wrong due to unfamiliarity with the respective
> architectures, sorry - but this illustrates the idea.

Ah, yes, thanks for the clear explanation. The "throw away
the TB" design didn't consider that (or vice-versa).

thanks
-- PMM
Richard Henderson Jan. 16, 2025, 2:54 p.m. UTC | #7
On 1/15/25 15:20, Ilya Leoshkevich wrote:
> Currently single-insn TBs created from I/O memory are not added to
> region_trees. Therefore, when they generate exceptions, they are not
> handled by cpu_restore_state_from_tb(). For x86 this is not a problem,
> because x86_restore_state_to_opc() only restores pc and cc, which are
> already correct. However, on several other architectures,
> restore_state_to_opc() restores more registers, and guests can notice
> incorrect values.
> 
> Fix by always calling tcg_tb_insert(). This may increase the size of
> region_trees, but tcg_region_reset_all() clears it once code_gen_buffer
> fills up, so it will not grow uncontrollably.
> 
> Co-developed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> ---
>   accel/tcg/translate-all.c | 16 ++++++++--------
>   1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 453eb20ec95..6333302813e 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -531,23 +531,23 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>           tb_reset_jump(tb, 1);
>       }
>   
> +    /*
> +     * Insert TB into the corresponding region tree before publishing it
> +     * through QHT. Otherwise rewinding happened in the TB might fail to
> +     * lookup itself using host PC.
> +     */
> +    tcg_tb_insert(tb);

I think what we need is to mark the tb CF_INVALID before inserting it. That way we'll 
never match in tb_lookup (comparing guest state, including cflags), but *will* find it in 
tcg_tb_lookup (comparing host_pc).


r~
Ilya Leoshkevich Jan. 16, 2025, 3:06 p.m. UTC | #8
On Thu, 2025-01-16 at 06:54 -0800, Richard Henderson wrote:
> On 1/15/25 15:20, Ilya Leoshkevich wrote:
> > Currently single-insn TBs created from I/O memory are not added to
> > region_trees. Therefore, when they generate exceptions, they are
> > not
> > handled by cpu_restore_state_from_tb(). For x86 this is not a
> > problem,
> > because x86_restore_state_to_opc() only restores pc and cc, which
> > are
> > already correct. However, on several other architectures,
> > restore_state_to_opc() restores more registers, and guests can
> > notice
> > incorrect values.
> > 
> > Fix by always calling tcg_tb_insert(). This may increase the size
> > of
> > region_trees, but tcg_region_reset_all() clears it once
> > code_gen_buffer
> > fills up, so it will not grow uncontrollably.
> > 
> > Co-developed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
> > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> > ---
> >   accel/tcg/translate-all.c | 16 ++++++++--------
> >   1 file changed, 8 insertions(+), 8 deletions(-)
> > 
> > diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> > index 453eb20ec95..6333302813e 100644
> > --- a/accel/tcg/translate-all.c
> > +++ b/accel/tcg/translate-all.c
> > @@ -531,23 +531,23 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
> >           tb_reset_jump(tb, 1);
> >       }
> >   
> > +    /*
> > +     * Insert TB into the corresponding region tree before
> > publishing it
> > +     * through QHT. Otherwise rewinding happened in the TB might
> > fail to
> > +     * lookup itself using host PC.
> > +     */
> > +    tcg_tb_insert(tb);
> 
> I think what we need is to mark the tb CF_INVALID before inserting
> it. That way we'll 
> never match in tb_lookup (comparing guest state, including cflags),
> but *will* find it in 
> tcg_tb_lookup (comparing host_pc).
> 
> 
> r~

How can tb_lookup() find it? With this change, it is inserted into
region_trees, but not into tb_ctx.htable - this is done by
tb_link_page(), which is not called. And because it's not in
tb_ctx.htable, it can't end up in tb_jmp_cache either.
Alex Bennée Jan. 16, 2025, 3:09 p.m. UTC | #9
Peter Maydell <peter.maydell@linaro.org> writes:

> On Thu, 16 Jan 2025 at 11:48, Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>>
>> On Thu, 2025-01-16 at 11:06 +0000, Peter Maydell wrote:
>> > The original reported problem here seems to me like it's a
>> > problem with whatever target's frontend code this is.
>> > This is a single instruction TB, so either:
>> >  * the generated code for it completes the insn without
>> >    raising an exception (no problem)
>> >  * the generated code for it should raise an exception
>> >    without having modified the CPU state (so there would
>> >    be nothing to do for restore_state_to_opc)
>> >
>> > It sounds like the target is generating code which does
>> > something like:
>> >  * do part of the instruction, including updating some of
>> >    the CPU state
>> >  * then decide it needs to raise an exception, and rely on
>> >    the restore_state_to_opc handling to undo the state updates
>> >    it did previously
>> >
>> > The assumption of the "throwaway single insn TB" is that
>> > you don't do that (i.e. that restore_state_to_opc is only
>> > there for the benefit of multi-insn TBs).
>
>> The problem is not a partial state update in an instruction, but rather
>> that on some targets restore_state_to_opc is more than just a
>> "restore" - it is also "prepare for handling an exception", i.e.:
>>
>> - arm: exception.syndrome
>> - hppa: unwind_breg, psw_n
>> - mips: btarget
>> - openrisc: ppc
>> - riscv: excp_uw2
>> - s390x: int_pgm_ilen
>>
>> Some of these may be wrong due to unfamiliarity with the respective
>> architectures, sorry - but this illustrates the idea.
>
> Ah, yes, thanks for the clear explanation. The "throw away
> the TB" design didn't consider that (or vice-versa).

We can certainly do with better docstrings for tcg_tb_lookup (via the
region tree) and tb_lookup (using cache and/or QHT) to make it clear the
difference between the two. I don't think we should ever use
tcg_tb_lookup for the purposes of executing a TB, just for resolution.

We have a few spare CF_ flags so maybe we could have a CF_RUNONCE flag
which is set for these TBs and assert its not set in tb_lookup along
with the current CF_INVALID flag. We could possibly set CF_INVALID
before executing the TB as we don't check the tb state from
tb_gen_code() before executing it but I guess that might be a little too
magic.

Rich, WDYT?

>
> thanks
> -- PMM
Philippe Mathieu-Daudé Jan. 16, 2025, 3:17 p.m. UTC | #10
On 16/1/25 16:09, Alex Bennée wrote:
> Peter Maydell <peter.maydell@linaro.org> writes:
> 
>> On Thu, 16 Jan 2025 at 11:48, Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>>>
>>> On Thu, 2025-01-16 at 11:06 +0000, Peter Maydell wrote:
>>>> The original reported problem here seems to me like it's a
>>>> problem with whatever target's frontend code this is.
>>>> This is a single instruction TB, so either:
>>>>   * the generated code for it completes the insn without
>>>>     raising an exception (no problem)
>>>>   * the generated code for it should raise an exception
>>>>     without having modified the CPU state (so there would
>>>>     be nothing to do for restore_state_to_opc)
>>>>
>>>> It sounds like the target is generating code which does
>>>> something like:
>>>>   * do part of the instruction, including updating some of
>>>>     the CPU state
>>>>   * then decide it needs to raise an exception, and rely on
>>>>     the restore_state_to_opc handling to undo the state updates
>>>>     it did previously
>>>>
>>>> The assumption of the "throwaway single insn TB" is that
>>>> you don't do that (i.e. that restore_state_to_opc is only
>>>> there for the benefit of multi-insn TBs).
>>
>>> The problem is not a partial state update in an instruction, but rather
>>> that on some targets restore_state_to_opc is more than just a
>>> "restore" - it is also "prepare for handling an exception", i.e.:
>>>
>>> - arm: exception.syndrome
>>> - hppa: unwind_breg, psw_n
>>> - mips: btarget
>>> - openrisc: ppc
>>> - riscv: excp_uw2
>>> - s390x: int_pgm_ilen

Should we move that to another TCGCPUOps handler?

>>> Some of these may be wrong due to unfamiliarity with the respective
>>> architectures, sorry - but this illustrates the idea.
>>
>> Ah, yes, thanks for the clear explanation. The "throw away
>> the TB" design didn't consider that (or vice-versa).
> 
> We can certainly do with better docstrings for tcg_tb_lookup (via the
> region tree) and tb_lookup (using cache and/or QHT) to make it clear the
> difference between the two. I don't think we should ever use
> tcg_tb_lookup for the purposes of executing a TB, just for resolution.
> 
> We have a few spare CF_ flags so maybe we could have a CF_RUNONCE flag
> which is set for these TBs and assert its not set in tb_lookup along
> with the current CF_INVALID flag. We could possibly set CF_INVALID
> before executing the TB as we don't check the tb state from
> tb_gen_code() before executing it but I guess that might be a little too
> magic.
> 
> Rich, WDYT?
> 
>>
>> thanks
>> -- PMM
>
Richard Henderson Jan. 16, 2025, 3:17 p.m. UTC | #11
On 1/16/25 07:06, Ilya Leoshkevich wrote:
> On Thu, 2025-01-16 at 06:54 -0800, Richard Henderson wrote:
>> On 1/15/25 15:20, Ilya Leoshkevich wrote:
>>> Currently single-insn TBs created from I/O memory are not added to
>>> region_trees. Therefore, when they generate exceptions, they are
>>> not
>>> handled by cpu_restore_state_from_tb(). For x86 this is not a
>>> problem,
>>> because x86_restore_state_to_opc() only restores pc and cc, which
>>> are
>>> already correct. However, on several other architectures,
>>> restore_state_to_opc() restores more registers, and guests can
>>> notice
>>> incorrect values.
>>>
>>> Fix by always calling tcg_tb_insert(). This may increase the size
>>> of
>>> region_trees, but tcg_region_reset_all() clears it once
>>> code_gen_buffer
>>> fills up, so it will not grow uncontrollably.
>>>
>>> Co-developed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
>>> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
>>> ---
>>>    accel/tcg/translate-all.c | 16 ++++++++--------
>>>    1 file changed, 8 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
>>> index 453eb20ec95..6333302813e 100644
>>> --- a/accel/tcg/translate-all.c
>>> +++ b/accel/tcg/translate-all.c
>>> @@ -531,23 +531,23 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>>>            tb_reset_jump(tb, 1);
>>>        }
>>>    
>>> +    /*
>>> +     * Insert TB into the corresponding region tree before
>>> publishing it
>>> +     * through QHT. Otherwise rewinding happened in the TB might
>>> fail to
>>> +     * lookup itself using host PC.
>>> +     */
>>> +    tcg_tb_insert(tb);
>>
>> I think what we need is to mark the tb CF_INVALID before inserting
>> it. That way we'll
>> never match in tb_lookup (comparing guest state, including cflags),
>> but *will* find it in
>> tcg_tb_lookup (comparing host_pc).
>>
>>
>> r~
> 
> How can tb_lookup() find it? With this change, it is inserted into
> region_trees, but not into tb_ctx.htable - this is done by
> tb_link_page(), which is not called. And because it's not in
> tb_ctx.htable, it can't end up in tb_jmp_cache either.

You're absolutely right.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~
Alex Bennée Jan. 16, 2025, 3:40 p.m. UTC | #12
Richard Henderson <richard.henderson@linaro.org> writes:

> On 1/16/25 07:06, Ilya Leoshkevich wrote:
>> On Thu, 2025-01-16 at 06:54 -0800, Richard Henderson wrote:
>>> On 1/15/25 15:20, Ilya Leoshkevich wrote:
>>>> Currently single-insn TBs created from I/O memory are not added to
>>>> region_trees. Therefore, when they generate exceptions, they are
>>>> not
>>>> handled by cpu_restore_state_from_tb(). For x86 this is not a
>>>> problem,
>>>> because x86_restore_state_to_opc() only restores pc and cc, which
>>>> are
>>>> already correct. However, on several other architectures,
>>>> restore_state_to_opc() restores more registers, and guests can
>>>> notice
>>>> incorrect values.
>>>>
>>>> Fix by always calling tcg_tb_insert(). This may increase the size
>>>> of
>>>> region_trees, but tcg_region_reset_all() clears it once
>>>> code_gen_buffer
>>>> fills up, so it will not grow uncontrollably.
>>>>
>>>> Co-developed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
>>>> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
>>>> ---
>>>>    accel/tcg/translate-all.c | 16 ++++++++--------
>>>>    1 file changed, 8 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
>>>> index 453eb20ec95..6333302813e 100644
>>>> --- a/accel/tcg/translate-all.c
>>>> +++ b/accel/tcg/translate-all.c
>>>> @@ -531,23 +531,23 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>>>>            tb_reset_jump(tb, 1);
>>>>        }
>>>>    +    /*
>>>> +     * Insert TB into the corresponding region tree before
>>>> publishing it
>>>> +     * through QHT. Otherwise rewinding happened in the TB might
>>>> fail to
>>>> +     * lookup itself using host PC.
>>>> +     */
>>>> +    tcg_tb_insert(tb);
>>>
>>> I think what we need is to mark the tb CF_INVALID before inserting
>>> it. That way we'll
>>> never match in tb_lookup (comparing guest state, including cflags),
>>> but *will* find it in
>>> tcg_tb_lookup (comparing host_pc).
>>>
>>>
>>> r~
>> How can tb_lookup() find it? With this change, it is inserted into
>> region_trees, but not into tb_ctx.htable - this is done by
>> tb_link_page(), which is not called. And because it's not in
>> tb_ctx.htable, it can't end up in tb_jmp_cache either.
>
> You're absolutely right.
>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

It would still be nice to update the docstrings on the two lookup
functions to make it clear what they are for though.
Peter Maydell Jan. 16, 2025, 3:41 p.m. UTC | #13
On Thu, 16 Jan 2025 at 15:40, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Richard Henderson <richard.henderson@linaro.org> writes:
>
> > On 1/16/25 07:06, Ilya Leoshkevich wrote:
> >> On Thu, 2025-01-16 at 06:54 -0800, Richard Henderson wrote:
> >>> On 1/15/25 15:20, Ilya Leoshkevich wrote:
> >>>> Currently single-insn TBs created from I/O memory are not added to
> >>>> region_trees. Therefore, when they generate exceptions, they are
> >>>> not
> >>>> handled by cpu_restore_state_from_tb(). For x86 this is not a
> >>>> problem,
> >>>> because x86_restore_state_to_opc() only restores pc and cc, which
> >>>> are
> >>>> already correct. However, on several other architectures,
> >>>> restore_state_to_opc() restores more registers, and guests can
> >>>> notice
> >>>> incorrect values.
> >>>>
> >>>> Fix by always calling tcg_tb_insert(). This may increase the size
> >>>> of
> >>>> region_trees, but tcg_region_reset_all() clears it once
> >>>> code_gen_buffer
> >>>> fills up, so it will not grow uncontrollably.
> >>>>
> >>>> Co-developed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
> >>>> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> >>>> ---
> >>>>    accel/tcg/translate-all.c | 16 ++++++++--------
> >>>>    1 file changed, 8 insertions(+), 8 deletions(-)
> >>>>
> >>>> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> >>>> index 453eb20ec95..6333302813e 100644
> >>>> --- a/accel/tcg/translate-all.c
> >>>> +++ b/accel/tcg/translate-all.c
> >>>> @@ -531,23 +531,23 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
> >>>>            tb_reset_jump(tb, 1);
> >>>>        }
> >>>>    +    /*
> >>>> +     * Insert TB into the corresponding region tree before
> >>>> publishing it
> >>>> +     * through QHT. Otherwise rewinding happened in the TB might
> >>>> fail to
> >>>> +     * lookup itself using host PC.
> >>>> +     */
> >>>> +    tcg_tb_insert(tb);
> >>>
> >>> I think what we need is to mark the tb CF_INVALID before inserting
> >>> it. That way we'll
> >>> never match in tb_lookup (comparing guest state, including cflags),
> >>> but *will* find it in
> >>> tcg_tb_lookup (comparing host_pc).
> >>>
> >>>
> >>> r~
> >> How can tb_lookup() find it? With this change, it is inserted into
> >> region_trees, but not into tb_ctx.htable - this is done by
> >> tb_link_page(), which is not called. And because it's not in
> >> tb_ctx.htable, it can't end up in tb_jmp_cache either.
> >
> > You're absolutely right.
> >
> > Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>
> It would still be nice to update the docstrings on the two lookup
> functions to make it clear what they are for though.

Given this thread, perhaps also consider briefly mentioning some
of the details in the commit message and/or the comment?

-- PMM
Ilya Leoshkevich Jan. 16, 2025, 4:08 p.m. UTC | #14
On Thu, 2025-01-16 at 15:41 +0000, Peter Maydell wrote:
> On Thu, 16 Jan 2025 at 15:40, Alex Bennée <alex.bennee@linaro.org>
> wrote:
> > 
> > Richard Henderson <richard.henderson@linaro.org> writes:
> > 
> > > On 1/16/25 07:06, Ilya Leoshkevich wrote:
> > > > On Thu, 2025-01-16 at 06:54 -0800, Richard Henderson wrote:
> > > > > On 1/15/25 15:20, Ilya Leoshkevich wrote:
> > > > > > Currently single-insn TBs created from I/O memory are not
> > > > > > added to
> > > > > > region_trees. Therefore, when they generate exceptions,
> > > > > > they are
> > > > > > not
> > > > > > handled by cpu_restore_state_from_tb(). For x86 this is not
> > > > > > a
> > > > > > problem,
> > > > > > because x86_restore_state_to_opc() only restores pc and cc,
> > > > > > which
> > > > > > are
> > > > > > already correct. However, on several other architectures,
> > > > > > restore_state_to_opc() restores more registers, and guests
> > > > > > can
> > > > > > notice
> > > > > > incorrect values.
> > > > > > 
> > > > > > Fix by always calling tcg_tb_insert(). This may increase
> > > > > > the size
> > > > > > of
> > > > > > region_trees, but tcg_region_reset_all() clears it once
> > > > > > code_gen_buffer
> > > > > > fills up, so it will not grow uncontrollably.
> > > > > > 
> > > > > > Co-developed-by: Nina Schoetterl-Glausch
> > > > > > <nsg@linux.ibm.com>
> > > > > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> > > > > > ---
> > > > > >    accel/tcg/translate-all.c | 16 ++++++++--------
> > > > > >    1 file changed, 8 insertions(+), 8 deletions(-)
> > > > > > 
> > > > > > diff --git a/accel/tcg/translate-all.c
> > > > > > b/accel/tcg/translate-all.c
> > > > > > index 453eb20ec95..6333302813e 100644
> > > > > > --- a/accel/tcg/translate-all.c
> > > > > > +++ b/accel/tcg/translate-all.c
> > > > > > @@ -531,23 +531,23 @@ TranslationBlock
> > > > > > *tb_gen_code(CPUState *cpu,
> > > > > >            tb_reset_jump(tb, 1);
> > > > > >        }
> > > > > >    +    /*
> > > > > > +     * Insert TB into the corresponding region tree before
> > > > > > publishing it
> > > > > > +     * through QHT. Otherwise rewinding happened in the TB
> > > > > > might
> > > > > > fail to
> > > > > > +     * lookup itself using host PC.
> > > > > > +     */
> > > > > > +    tcg_tb_insert(tb);
> > > > > 
> > > > > I think what we need is to mark the tb CF_INVALID before
> > > > > inserting
> > > > > it. That way we'll
> > > > > never match in tb_lookup (comparing guest state, including
> > > > > cflags),
> > > > > but *will* find it in
> > > > > tcg_tb_lookup (comparing host_pc).
> > > > > 
> > > > > 
> > > > > r~
> > > > How can tb_lookup() find it? With this change, it is inserted
> > > > into
> > > > region_trees, but not into tb_ctx.htable - this is done by
> > > > tb_link_page(), which is not called. And because it's not in
> > > > tb_ctx.htable, it can't end up in tb_jmp_cache either.
> > > 
> > > You're absolutely right.
> > > 
> > > Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> > 
> > It would still be nice to update the docstrings on the two lookup
> > functions to make it clear what they are for though.
> 
> Given this thread, perhaps also consider briefly mentioning some
> of the details in the commit message and/or the comment?
> 
> -- PMM

Ok. I will send a v2 with docstring additions in a separate patch.
diff mbox series

Patch

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 453eb20ec95..6333302813e 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -531,23 +531,23 @@  TranslationBlock *tb_gen_code(CPUState *cpu,
         tb_reset_jump(tb, 1);
     }
 
+    /*
+     * Insert TB into the corresponding region tree before publishing it
+     * through QHT. Otherwise rewinding happened in the TB might fail to
+     * lookup itself using host PC.
+     */
+    tcg_tb_insert(tb);
+
     /*
      * If the TB is not associated with a physical RAM page then it must be
      * a temporary one-insn TB, and we have nothing left to do. Return early
-     * before attempting to link to other TBs or add to the lookup table.
+     * before attempting to link to other TBs.
      */
     if (tb_page_addr0(tb) == -1) {
         assert_no_pages_locked();
         return tb;
     }
 
-    /*
-     * Insert TB into the corresponding region tree before publishing it
-     * through QHT. Otherwise rewinding happened in the TB might fail to
-     * lookup itself using host PC.
-     */
-    tcg_tb_insert(tb);
-
     /*
      * No explicit memory barrier is required -- tb_link_page() makes the
      * TB visible in a consistent state.