diff mbox series

Plugin Memory Callback Debugging

Message ID Y3QNRWUK8BLRQlaQ@strawberry.localdomain (mailing list archive)
State New, archived
Headers show
Series Plugin Memory Callback Debugging | expand

Commit Message

Aaron Lindsay Nov. 15, 2022, 10:05 p.m. UTC
Hello,

I have been wrestling with what might be a bug in the plugin memory
callbacks. The immediate error is that I hit the
`g_assert_not_reached()` in the 'default:' case in
qemu_plugin_vcpu_mem_cb, indicating the callback type was invalid. When
breaking on this assertion in gdb, the contents of cpu->plugin_mem_cbs
are obviously bogus (`len` was absurdly high, for example).  After doing
some further digging/instrumenting, I eventually found that
`free_dyn_cb_arr(void *p, ...)` is being called shortly before the
assertion is hit with `p` pointing to the same address as
`cpu->plugin_mem_cbs` will later hold at assertion-time. We are freeing
the memory still pointed to by `cpu->plugin_mem_cbs`.

I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
end of an instruction/TB's execution, so its not exactly clear to me how this
is occurring. However, I suspect it may be relevant that we are calling
`free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`. 

I have additionally found that the below addition allows me to run successfully
without hitting the assert:

Unfortunately, the workload/setup I have encountered this bug with are
difficult to reproduce in a way suitable for sharing upstream (admittedly
potentially because I do not fully understand the conditions necessary to
trigger it). It is also deep into a run, and I haven't found a good way
to break in gdb immediately prior to it happening in order to inspect
it, without perturbing it enough such that it doesn't happen... 

I welcome any feedback or insights on how to further nail down the
failure case and/or help in working towards an appropriate solution.

Thanks!

-Aaron

Comments

Alex Bennée Nov. 15, 2022, 10:36 p.m. UTC | #1
Aaron Lindsay <aaron@os.amperecomputing.com> writes:

> Hello,
>
> I have been wrestling with what might be a bug in the plugin memory
> callbacks. The immediate error is that I hit the
> `g_assert_not_reached()` in the 'default:' case in
> qemu_plugin_vcpu_mem_cb, indicating the callback type was invalid. When
> breaking on this assertion in gdb, the contents of cpu->plugin_mem_cbs
> are obviously bogus (`len` was absurdly high, for example).  After doing
> some further digging/instrumenting, I eventually found that
> `free_dyn_cb_arr(void *p, ...)` is being called shortly before the
> assertion is hit with `p` pointing to the same address as
> `cpu->plugin_mem_cbs` will later hold at assertion-time. We are freeing
> the memory still pointed to by `cpu->plugin_mem_cbs`.
>
> I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
> end of an instruction/TB's execution, so its not exactly clear to me how this
> is occurring. However, I suspect it may be relevant that we are calling
> `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.

Hmm I'm going to have to remind myself about how this bit works.

>
> I have additionally found that the below addition allows me to run successfully
> without hitting the assert:
>
> diff --git a/plugins/core.c b/plugins/core.c
> --- a/plugins/core.c
> +++ b/plugins/core.c
> @@ -427,9 +427,14 @@ static bool free_dyn_cb_arr(void *p, uint32_t h, void *userp)
>
>  void qemu_plugin_flush_cb(void)
>  {
> +    CPUState *cpu;
>      qht_iter_remove(&plugin.dyn_cb_arr_ht, free_dyn_cb_arr, NULL);
>      qht_reset(&plugin.dyn_cb_arr_ht);
>
> +    CPU_FOREACH(cpu) {
> +        cpu->plugin_mem_cbs = NULL;
> +    }
> +

This is essentially qemu_plugin_disable_mem_helpers() but for all CPUs.
I think we should be able to treat the CPUs separately.

>      plugin_cb__simple(QEMU_PLUGIN_EV_FLUSH);
>  }
>
> Unfortunately, the workload/setup I have encountered this bug with are
> difficult to reproduce in a way suitable for sharing upstream (admittedly
> potentially because I do not fully understand the conditions necessary to
> trigger it). It is also deep into a run

How many full TB flushes have there been? You only see
qemu_plugin_flush_cb when we flush whole translation buffer (which is
something we do more often when plugins exit).

Does lowering tb-size make it easier to hit the failure mode?

> , and I haven't found a good way
> to break in gdb immediately prior to it happening in order to inspect
> it, without perturbing it enough such that it doesn't happen...

This is exactly the sort of thing rr is great for. Can you trigger it in
that?

  https://rr-project.org/

>
> I welcome any feedback or insights on how to further nail down the
> failure case and/or help in working towards an appropriate solution.
>
> Thanks!
>
> -Aaron
Aaron Lindsay Nov. 18, 2022, 9:58 p.m. UTC | #2
On Nov 15 22:36, Alex Bennée wrote:
> 
> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> 
> > Hello,
> >
> > I have been wrestling with what might be a bug in the plugin memory
> > callbacks. The immediate error is that I hit the
> > `g_assert_not_reached()` in the 'default:' case in
> > qemu_plugin_vcpu_mem_cb, indicating the callback type was invalid. When
> > breaking on this assertion in gdb, the contents of cpu->plugin_mem_cbs
> > are obviously bogus (`len` was absurdly high, for example).  After doing
> > some further digging/instrumenting, I eventually found that
> > `free_dyn_cb_arr(void *p, ...)` is being called shortly before the
> > assertion is hit with `p` pointing to the same address as
> > `cpu->plugin_mem_cbs` will later hold at assertion-time. We are freeing
> > the memory still pointed to by `cpu->plugin_mem_cbs`.
> >
> > I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
> > end of an instruction/TB's execution, so its not exactly clear to me how this
> > is occurring. However, I suspect it may be relevant that we are calling
> > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
> 
> Hmm I'm going to have to remind myself about how this bit works.
> 
> >
> > I have additionally found that the below addition allows me to run successfully
> > without hitting the assert:
> >
> > diff --git a/plugins/core.c b/plugins/core.c
> > --- a/plugins/core.c
> > +++ b/plugins/core.c
> > @@ -427,9 +427,14 @@ static bool free_dyn_cb_arr(void *p, uint32_t h, void *userp)
> >
> >  void qemu_plugin_flush_cb(void)
> >  {
> > +    CPUState *cpu;
> >      qht_iter_remove(&plugin.dyn_cb_arr_ht, free_dyn_cb_arr, NULL);
> >      qht_reset(&plugin.dyn_cb_arr_ht);
> >
> > +    CPU_FOREACH(cpu) {
> > +        cpu->plugin_mem_cbs = NULL;
> > +    }
> > +
> 
> This is essentially qemu_plugin_disable_mem_helpers() but for all CPUs.
> I think we should be able to treat the CPUs separately.

I agree it's similar to qemu_plugin_disable_mem_helpers(), but for all
CPUs. Though a perhaps important distinction is that its occurring
unconditionally in conjunction with the event which flushes the TBs and
frees the callback arrays.

Isn't the code calling into qemu_plugin_flush_cb() already clearing TBs
for all CPUs? Can you help me understand why treating the CPUs
separately would be better?

> >      plugin_cb__simple(QEMU_PLUGIN_EV_FLUSH);
> >  }
> >
> > Unfortunately, the workload/setup I have encountered this bug with are
> > difficult to reproduce in a way suitable for sharing upstream (admittedly
> > potentially because I do not fully understand the conditions necessary to
> > trigger it). It is also deep into a run
> 
> How many full TB flushes have there been? You only see
> qemu_plugin_flush_cb when we flush whole translation buffer (which is
> something we do more often when plugins exit).

There have been maybe hundreds of TB flushes at this point (I, erm, use
qemu_plugin_reset() somewhat liberally in this plugin). I believe it is
the most recent such flush that is problematic - I observe the call to
free_dyn_cb_arr() mentioned above as a result of it.

> Does lowering tb-size make it easier to hit the failure mode?

Hrm, interesting, I have not tried that. I'll poke at that if the rr
debug doesn't pan out.

> > , and I haven't found a good way
> > to break in gdb immediately prior to it happening in order to inspect
> > it, without perturbing it enough such that it doesn't happen...
> 
> This is exactly the sort of thing rr is great for. Can you trigger it in
> that?
> 
>   https://rr-project.org/

I had not used rr before, thanks for the push to do so!

I have, so far, discovered the following timeline:
1. My plugin receives a instruction execution callback for a load
   instruction. At this time, cpu->plugin_mem_cbs points to the same
   memory which will later be freed
2. During the handling of this callback, my plugin calls qemu_plugin_reset()
3. Ostensibly something goes wrong here with the cleanup of
   cpu->plugin_mem_cbs??? 
4. Step 2 triggers the TBs to be flushed, which frees the memory pointed
   to by cpu->plugin_mem_cbs 

Since I have this nicely recorded now with rr, it ought to be easier to
poke at, though I admit I'm not entirely sure how to poke at the
generated code to see what's going wrong (i.e. why wouldn't the tb exit
stuff be clearing this pointer like normal?).

> > I welcome any feedback or insights on how to further nail down the
> > failure case and/or help in working towards an appropriate solution.

-Aaron
Aaron Lindsay Nov. 18, 2022, 10:02 p.m. UTC | #3
Sorry, left off the very end of my timeline:

On Nov 18 16:58, Aaron Lindsay wrote:
> I have, so far, discovered the following timeline:
> 1. My plugin receives a instruction execution callback for a load
>    instruction. At this time, cpu->plugin_mem_cbs points to the same
>    memory which will later be freed
> 2. During the handling of this callback, my plugin calls qemu_plugin_reset()
> 3. Ostensibly something goes wrong here with the cleanup of
>    cpu->plugin_mem_cbs??? 
> 4. Step 2 triggers the TBs to be flushed, which frees the memory pointed
>    to by cpu->plugin_mem_cbs 

5. A store exclusive instruction is translated and then executed, which
   requires the use of a helper. When executed, this helper checks
   cpu->plugin_mem_cbs, which is non-null, so it attempts to dereference
   and use it, resulting in the assertion. 

-Aaron
Aaron Lindsay Nov. 21, 2022, 8:18 p.m. UTC | #4
On Nov 15 22:36, Alex Bennée wrote:
> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> > I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
> > end of an instruction/TB's execution, so its not exactly clear to me how this
> > is occurring. However, I suspect it may be relevant that we are calling
> > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
> 
> Hmm I'm going to have to remind myself about how this bit works.

When is it expected that cpu->plugin_mem_cbs is reset to NULL if it is
set for an instruction? Is it guaranteed it is reset by the end of the
tb? If I were to put an assertion in cpu_tb_exec() just after the call
to tcg_qemu_tb_exec(), should cpu->plugin_mem_cbs always be NULL there?

In my debugging, I *think* I'm seeing a tb set cpu->plugin_mem_cbs
for an instruction, and then not reset it to NULL. I'm wondering if its
getting optimized away or something, but want to make sure I've got my
assumptions correct about how this is intended to be working.

Thanks!

-Aaron
Alex Bennée Nov. 21, 2022, 9:51 p.m. UTC | #5
Aaron Lindsay <aaron@os.amperecomputing.com> writes:

> On Nov 15 22:36, Alex Bennée wrote:
>> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
>> > I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
>> > end of an instruction/TB's execution, so its not exactly clear to me how this
>> > is occurring. However, I suspect it may be relevant that we are calling
>> > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
>> 
>> Hmm I'm going to have to remind myself about how this bit works.
>
> When is it expected that cpu->plugin_mem_cbs is reset to NULL if it is
> set for an instruction? Is it guaranteed it is reset by the end of the
> tb?

It should be by the end of the instruction. See
inject_mem_disable_helper() which inserts TCG code to disable the
helpers. We also have plugin_gen_disable_mem_helpers() which should
catch every exit out of a block (exit_tb, goto_tb, goto_ptr). That is
why qemu_plugin_disable_mem_helpers() is only really concerned about
when we longjmp out of the loop.

> If I were to put an assertion in cpu_tb_exec() just after the call
> to tcg_qemu_tb_exec(), should cpu->plugin_mem_cbs always be NULL
> there?

Yes I think so. 

> In my debugging, I *think* I'm seeing a tb set cpu->plugin_mem_cbs
> for an instruction, and then not reset it to NULL. I'm wondering if its
> getting optimized away or something, but want to make sure I've got my
> assumptions correct about how this is intended to be working.

We are expecting some stuff to dead code away (but hopefully in pairs).
We don't know ahead of the instruction decode if it will be a memory
instruction so opportunistically insert our empty helper calls. If no
memory access is done those ops should be gone. This is all done to
avoid doing a 2 pass translate.

Richard,

Have I got that right? I think thats how I remember the TCG code working.

>
> Thanks!
>
> -Aaron
Alex Bennée Nov. 21, 2022, 10:02 p.m. UTC | #6
Aaron Lindsay <aaron@os.amperecomputing.com> writes:

> Sorry, left off the very end of my timeline:
>
> On Nov 18 16:58, Aaron Lindsay wrote:
>> I have, so far, discovered the following timeline:
>> 1. My plugin receives a instruction execution callback for a load
>>    instruction. At this time, cpu->plugin_mem_cbs points to the same
>>    memory which will later be freed
>> 2. During the handling of this callback, my plugin calls
>qemu_plugin_reset()

The final plugin reset should only execute in the safe async context
(i.e. no other vCPUs running code). That flushes all current generated
code.

>> 3. Ostensibly something goes wrong here with the cleanup of
>>    cpu->plugin_mem_cbs???

This may be missed by the reset path (hence your patch) but it should be
being reset every instruction we instrument.

>> 4. Step 2 triggers the TBs to be flushed, which frees the memory pointed
>>    to by cpu->plugin_mem_cbs 
>
> 5. A store exclusive instruction is translated and then executed, which
>    requires the use of a helper. When executed, this helper checks
>    cpu->plugin_mem_cbs, which is non-null, so it attempts to dereference
>    and use it, resulting in the assertion.

It should be being reset for each instruction I think.

>
> -Aaron
Richard Henderson Nov. 22, 2022, 2:22 a.m. UTC | #7
On 11/21/22 13:51, Alex Bennée wrote:
> 
> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> 
>> On Nov 15 22:36, Alex Bennée wrote:
>>> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
>>>> I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
>>>> end of an instruction/TB's execution, so its not exactly clear to me how this
>>>> is occurring. However, I suspect it may be relevant that we are calling
>>>> `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
>>>
>>> Hmm I'm going to have to remind myself about how this bit works.
>>
>> When is it expected that cpu->plugin_mem_cbs is reset to NULL if it is
>> set for an instruction? Is it guaranteed it is reset by the end of the
>> tb?
> 
> It should be by the end of the instruction. See
> inject_mem_disable_helper() which inserts TCG code to disable the
> helpers. We also have plugin_gen_disable_mem_helpers() which should
> catch every exit out of a block (exit_tb, goto_tb, goto_ptr). That is
> why qemu_plugin_disable_mem_helpers() is only really concerned about
> when we longjmp out of the loop.
> 
>> If I were to put an assertion in cpu_tb_exec() just after the call
>> to tcg_qemu_tb_exec(), should cpu->plugin_mem_cbs always be NULL
>> there?
> 
> Yes I think so.

Indeed.

As an aside, this field should be moved out of CPUState to DisasContextBase, because it's 
only used in code generation.

> 
>> In my debugging, I *think* I'm seeing a tb set cpu->plugin_mem_cbs
>> for an instruction, and then not reset it to NULL. I'm wondering if its
>> getting optimized away or something, but want to make sure I've got my
>> assumptions correct about how this is intended to be working.
> 
> We are expecting some stuff to dead code away (but hopefully in pairs).
> We don't know ahead of the instruction decode if it will be a memory
> instruction so opportunistically insert our empty helper calls. If no
> memory access is done those ops should be gone. This is all done to
> avoid doing a 2 pass translate.
> 
> Richard,
> 
> Have I got that right? I think thats how I remember the TCG code working.

That's about right.


r~
Aaron Lindsay Nov. 22, 2022, 3:57 p.m. UTC | #8
On Nov 21 18:22, Richard Henderson wrote:
> On 11/21/22 13:51, Alex Bennée wrote:
> > 
> > Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> > 
> > > On Nov 15 22:36, Alex Bennée wrote:
> > > > Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> > > > > I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
> > > > > end of an instruction/TB's execution, so its not exactly clear to me how this
> > > > > is occurring. However, I suspect it may be relevant that we are calling
> > > > > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
> > > > 
> > > > Hmm I'm going to have to remind myself about how this bit works.
> > > 
> > > When is it expected that cpu->plugin_mem_cbs is reset to NULL if it is
> > > set for an instruction? Is it guaranteed it is reset by the end of the
> > > tb?
> > 
> > It should be by the end of the instruction. See
> > inject_mem_disable_helper() which inserts TCG code to disable the
> > helpers. We also have plugin_gen_disable_mem_helpers() which should
> > catch every exit out of a block (exit_tb, goto_tb, goto_ptr). That is
> > why qemu_plugin_disable_mem_helpers() is only really concerned about
> > when we longjmp out of the loop.
> > 
> > > If I were to put an assertion in cpu_tb_exec() just after the call
> > > to tcg_qemu_tb_exec(), should cpu->plugin_mem_cbs always be NULL
> > > there?
> > 
> > Yes I think so.
> 
> Indeed.

Well, the good news is that if this is an assumption we're relying on, it is
now trivial to reproduce the problem!

Compile some simple program (doesn't really matter, the issue gets triggered
early):

$ echo "int main() { return 0; }" > simple.c && gcc simple.c -o simple

Make this change to cpu_tb_exec():

> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 356fe348de..50a010327d 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -436,6 +436,9 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int *tb_exit)
> 
>      qemu_thread_jit_execute();
>      ret = tcg_qemu_tb_exec(env, tb_ptr);
> +    if (cpu->plugin_mem_cbs != NULL) {
> +        g_assert_not_reached();
> +    }
>      cpu->can_do_io = 1;
>      /*
>       * TODO: Delay swapping back to the read-write region of the TB

And run:

$ ./build/qemu-aarch64 -plugin contrib/plugins/libexeclog.so -d plugin ./simple

You should fairly quickly see something like:

> [snip]
> 0, 0x5502814d04, 0xb4000082, ""
> 0, 0x5502814d08, 0xf9400440, "", load, 0x5502844ed0
> 0, 0x5502814d0c, 0xf1001c1f, ""
> **
> ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be reached
> Bail out! ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be reached

When digging through my other failure in `rr` I saw the cpu->plugin_mem_cbs
pointer changing from one non-null value to another (which also seems to
indicate it is not being cleared between instructions).

Does this hint that there are cases where reset cpu->plugin_mem_cbs to NULL is
getting optimized away, but not the code to set it in the first place?

-Aaron
Aaron Lindsay Nov. 22, 2022, 5:05 p.m. UTC | #9
On Nov 21 22:02, Alex Bennée wrote:
> 
> Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> 
> > Sorry, left off the very end of my timeline:
> >
> > On Nov 18 16:58, Aaron Lindsay wrote:
> >> I have, so far, discovered the following timeline:
> >> 1. My plugin receives a instruction execution callback for a load
> >>    instruction. At this time, cpu->plugin_mem_cbs points to the same
> >>    memory which will later be freed
> >> 2. During the handling of this callback, my plugin calls
> >qemu_plugin_reset()
> 
> The final plugin reset should only execute in the safe async context
> (i.e. no other vCPUs running code). That flushes all current generated
> code.
> 
> >> 3. Ostensibly something goes wrong here with the cleanup of
> >>    cpu->plugin_mem_cbs???
> 
> This may be missed by the reset path (hence your patch) but it should be
> being reset every instruction we instrument.
> 
> >> 4. Step 2 triggers the TBs to be flushed, which frees the memory pointed
> >>    to by cpu->plugin_mem_cbs 
> >
> > 5. A store exclusive instruction is translated and then executed, which
> >    requires the use of a helper. When executed, this helper checks
> >    cpu->plugin_mem_cbs, which is non-null, so it attempts to dereference
> >    and use it, resulting in the assertion.
> 
> It should be being reset for each instruction I think.

FYI - I suspect my above presentation of the problem suffered from the
"searching where the streetlamp is instead of where you lost something"
problem. In other words, I did/do observe the error at reset, but I now
believe that is merely where it is easiest to observe. The
cpu->plugin_mem_cbs doesn't appear to be reset at the end of
instructions and manifests at reset because that's when the underlying
memory is freed.

-Aaron
Aaron Lindsay Nov. 29, 2022, 8:37 p.m. UTC | #10
On Nov 22 10:57, Aaron Lindsay wrote:
> On Nov 21 18:22, Richard Henderson wrote:
> > On 11/21/22 13:51, Alex Bennée wrote:
> > > 
> > > Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> > > 
> > > > On Nov 15 22:36, Alex Bennée wrote:
> > > > > Aaron Lindsay <aaron@os.amperecomputing.com> writes:
> > > > > > I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
> > > > > > end of an instruction/TB's execution, so its not exactly clear to me how this
> > > > > > is occurring. However, I suspect it may be relevant that we are calling
> > > > > > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
> > > > > 
> > > > > Hmm I'm going to have to remind myself about how this bit works.
> > > > 
> > > > When is it expected that cpu->plugin_mem_cbs is reset to NULL if it is
> > > > set for an instruction? Is it guaranteed it is reset by the end of the
> > > > tb?
> > > 
> > > It should be by the end of the instruction. See
> > > inject_mem_disable_helper() which inserts TCG code to disable the
> > > helpers. We also have plugin_gen_disable_mem_helpers() which should
> > > catch every exit out of a block (exit_tb, goto_tb, goto_ptr). That is
> > > why qemu_plugin_disable_mem_helpers() is only really concerned about
> > > when we longjmp out of the loop.
> > > 
> > > > If I were to put an assertion in cpu_tb_exec() just after the call
> > > > to tcg_qemu_tb_exec(), should cpu->plugin_mem_cbs always be NULL
> > > > there?
> > > 
> > > Yes I think so.
> > 
> > Indeed.
> 
> Well, the good news is that if this is an assumption we're relying on, it is
> now trivial to reproduce the problem!
> 
> Compile some simple program (doesn't really matter, the issue gets triggered
> early):
> 
> $ echo "int main() { return 0; }" > simple.c && gcc simple.c -o simple
> 
> Make this change to cpu_tb_exec():
> 
> > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> > index 356fe348de..50a010327d 100644
> > --- a/accel/tcg/cpu-exec.c
> > +++ b/accel/tcg/cpu-exec.c
> > @@ -436,6 +436,9 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int *tb_exit)
> > 
> >      qemu_thread_jit_execute();
> >      ret = tcg_qemu_tb_exec(env, tb_ptr);
> > +    if (cpu->plugin_mem_cbs != NULL) {
> > +        g_assert_not_reached();
> > +    }
> >      cpu->can_do_io = 1;
> >      /*
> >       * TODO: Delay swapping back to the read-write region of the TB
> 
> And run:
> 
> $ ./build/qemu-aarch64 -plugin contrib/plugins/libexeclog.so -d plugin ./simple
> 
> You should fairly quickly see something like:
> 
> > [snip]
> > 0, 0x5502814d04, 0xb4000082, ""
> > 0, 0x5502814d08, 0xf9400440, "", load, 0x5502844ed0
> > 0, 0x5502814d0c, 0xf1001c1f, ""
> > **
> > ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be reached
> > Bail out! ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be reached
> 
> When digging through my other failure in `rr` I saw the cpu->plugin_mem_cbs
> pointer changing from one non-null value to another (which also seems to
> indicate it is not being cleared between instructions).
> 
> Does this hint that there are cases where reset cpu->plugin_mem_cbs to NULL is
> getting optimized away, but not the code to set it in the first place?

Is there anyone who could help take a look at this from the code gen
perspective?

-Aaron
Alex Bennée Dec. 1, 2022, 7:32 p.m. UTC | #11
Aaron Lindsay <aaron@os.amperecomputing.com> writes:

> On Nov 22 10:57, Aaron Lindsay wrote:
>> On Nov 21 18:22, Richard Henderson wrote:
>> > On 11/21/22 13:51, Alex Bennée wrote:
>> > > 
>> > > Aaron Lindsay <aaron@os.amperecomputing.com> writes:
>> > > 
>> > > > On Nov 15 22:36, Alex Bennée wrote:
>> > > > > Aaron Lindsay <aaron@os.amperecomputing.com> writes:
>> > > > > > I believe the code *should* always reset `cpu->plugin_mem_cbs` to NULL at the
>> > > > > > end of an instruction/TB's execution, so its not exactly clear to me how this
>> > > > > > is occurring. However, I suspect it may be relevant that we are calling
>> > > > > > `free_dyn_cb_arr()` because my plugin called `qemu_plugin_reset()`.
>> > > > > 
>> > > > > Hmm I'm going to have to remind myself about how this bit works.
>> > > > 
>> > > > When is it expected that cpu->plugin_mem_cbs is reset to NULL if it is
>> > > > set for an instruction? Is it guaranteed it is reset by the end of the
>> > > > tb?
>> > > 
>> > > It should be by the end of the instruction. See
>> > > inject_mem_disable_helper() which inserts TCG code to disable the
>> > > helpers. We also have plugin_gen_disable_mem_helpers() which should
>> > > catch every exit out of a block (exit_tb, goto_tb, goto_ptr). That is
>> > > why qemu_plugin_disable_mem_helpers() is only really concerned about
>> > > when we longjmp out of the loop.
>> > > 
>> > > > If I were to put an assertion in cpu_tb_exec() just after the call
>> > > > to tcg_qemu_tb_exec(), should cpu->plugin_mem_cbs always be NULL
>> > > > there?
>> > > 
>> > > Yes I think so.
>> > 
>> > Indeed.
>> 
>> Well, the good news is that if this is an assumption we're relying on, it is
>> now trivial to reproduce the problem!
>> 
>> Compile some simple program (doesn't really matter, the issue gets triggered
>> early):
>> 
>> $ echo "int main() { return 0; }" > simple.c && gcc simple.c -o simple
>> 
>> Make this change to cpu_tb_exec():
>> 
>> > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
>> > index 356fe348de..50a010327d 100644
>> > --- a/accel/tcg/cpu-exec.c
>> > +++ b/accel/tcg/cpu-exec.c
>> > @@ -436,6 +436,9 @@ cpu_tb_exec(CPUState *cpu, TranslationBlock *itb, int *tb_exit)
>> > 
>> >      qemu_thread_jit_execute();
>> >      ret = tcg_qemu_tb_exec(env, tb_ptr);
>> > +    if (cpu->plugin_mem_cbs != NULL) {
>> > +        g_assert_not_reached();
>> > +    }
>> >      cpu->can_do_io = 1;
>> >      /*
>> >       * TODO: Delay swapping back to the read-write region of the TB
>> 
>> And run:
>> 
>> $ ./build/qemu-aarch64 -plugin contrib/plugins/libexeclog.so -d plugin ./simple
>> 
>> You should fairly quickly see something like:
>> 
>> > [snip]
>> > 0, 0x5502814d04, 0xb4000082, ""
>> > 0, 0x5502814d08, 0xf9400440, "", load, 0x5502844ed0
>> > 0, 0x5502814d0c, 0xf1001c1f, ""

Hmm why are you not getting any opcodes there? Missing capstone?

>> > **
>> > ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code should not be reached
>> > Bail out! ERROR:../accel/tcg/cpu-exec.c:440:cpu_tb_exec: code
>> > should not be reached

Hmm I can replicate so I need to check my understanding. It fails in the
first block:


  ./qemu-aarch64 -plugin contrib/plugins/libexeclog.so -d \
    plugin,in_asm,op,op_opt,out_asm  ./tests/tcg/aarch64-linux-user/sha512

gives:

  PROLOGUE: [size=45]
  0x7f4b64000000:  55                       pushq    %rbp
  0x7f4b64000001:  53                       pushq    %rbx
  0x7f4b64000002:  41 54                    pushq    %r12
  0x7f4b64000004:  41 55                    pushq    %r13
  0x7f4b64000006:  41 56                    pushq    %r14
  0x7f4b64000008:  41 57                    pushq    %r15
  0x7f4b6400000a:  48 8b ef                 movq     %rdi, %rbp
  0x7f4b6400000d:  48 81 c4 78 fb ff ff     addq     $-0x488, %rsp
  0x7f4b64000014:  ff e6                    jmpq     *%rsi
  0x7f4b64000016:  33 c0                    xorl     %eax, %eax
  0x7f4b64000018:  48 81 c4 88 04 00 00     addq     $0x488, %rsp
  0x7f4b6400001f:  c5 f8 77                 vzeroupper 
  0x7f4b64000022:  41 5f                    popq     %r15
  0x7f4b64000024:  41 5e                    popq     %r14
  0x7f4b64000026:  41 5d                    popq     %r13
  0x7f4b64000028:  41 5c                    popq     %r12
  0x7f4b6400002a:  5b                       popq     %rbx
  0x7f4b6400002b:  5d                       popq     %rbp
  0x7f4b6400002c:  c3                       retq     

  ----------------
  IN: 
  0x004005d0:  d280001d  movz     x29, #0
  0x004005d4:  d280001e  movz     x30, #0
  0x004005d8:  aa0003e5  mov      x5, x0
  0x004005dc:  f94003e1  ldr      x1, [sp]
  0x004005e0:  910023e2  add      x2, sp, #8
  0x004005e4:  910003e6  mov      x6, sp
  0x004005e8:  90000000  adrp     x0, #0x400000
  0x004005ec:  91182000  add      x0, x0, #0x608
  0x004005f0:  b0000023  adrp     x3, #0x405000
  0x004005f4:  91014063  add      x3, x3, #0x50
  0x004005f8:  b0000024  adrp     x4, #0x405000
  0x004005fc:  91044084  add      x4, x4, #0x110
  0x00400600:  940010e8  bl       #0x4049a0

  OP:
   ld_i32 tmp0,env,$0xfffffffffffffff0
   brcond_i32 tmp0,$0x0,lt,$L0

   ---- 00000000004005d0 0000000000000000 0000000000000000
   mov_i64 tmp2,$0x55c0ff203430
   ld_i32 tmp0,env,$0xffffffffffffffa8
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
   mov_i64 x29,$0x0

   ---- 00000000004005d4 0000000000000000 0000000000000000
   mov_i64 tmp2,$0x55c0ff202800
   ld_i32 tmp0,env,$0xffffffffffffffa8
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
   mov_i64 lr,$0x0

   ---- 00000000004005d8 0000000000000000 0000000000000000
   mov_i64 tmp2,$0x55c0ff203400
   ld_i32 tmp0,env,$0xffffffffffffffa8
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
   mov_i64 x5,x0

This is a memory annotated instruction:

   ---- 00000000004005dc 0000000000000000 0000000000000f06
   mov_i64 tmp2,$0x55c0ff1a6150
   ld_i32 tmp0,env,$0xffffffffffffffa8
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
   mov_i64 tmp2,sp
   shl_i64 tmp4,tmp2,$0x8
   sar_i64 tmp4,tmp4,$0x8
   and_i64 tmp4,tmp4,tmp2
   mov_i64 tmp7,tmp4
   qemu_ld_i64 x1,tmp7,leq,0
   mov_i32 tmp8,$0x10030
   mov_i64 tmp11,$0x0
   ld_i32 tmp0,env,$0xffffffffffffffa8
   mov_i64 tmp10,tmp7
   call plugin(0x7f4b71c14388),$0x1,$0,tmp0,tmp8,tmp10,tmp11

   ---- 00000000004005e0 0000000000000000 0000000000000000
   mov_i64 tmp2,$0x55c0ff1fa4e0
   ld_i32 tmp0,env,$0xffffffffffffffa8
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
   add_i64 tmp2,sp,$0x8
   mov_i64 x2,tmp2

   ---- 00000000004005e4 0000000000000000 0000000000000000
   mov_i64 tmp2,$0x55c0ff193500
   ld_i32 tmp0,env,$0xffffffffffffffa8
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
   mov_i64 tmp2,sp
   mov_i64 x6,tmp2

   ---- 00000000004005e8 0000000000000000 0000000000000000
   mov_i64 tmp2,$0x55c0ff219700
   ld_i32 tmp0,env,$0xffffffffffffffa8
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
   mov_i64 x0,$0x400000

   ---- 00000000004005ec 0000000000000000 0000000000000000
   mov_i64 tmp2,$0x55c0ff21d160
   ld_i32 tmp0,env,$0xffffffffffffffa8
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
   add_i64 tmp2,x0,$0x608
   mov_i64 x0,tmp2

   ---- 00000000004005f0 0000000000000000 0000000000000000
   mov_i64 tmp2,$0x55c0ff217f80
   ld_i32 tmp0,env,$0xffffffffffffffa8
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
   mov_i64 x3,$0x405000

   ---- 00000000004005f4 0000000000000000 0000000000000000
   mov_i64 tmp2,$0x55c0ff2180c0
   ld_i32 tmp0,env,$0xffffffffffffffa8
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
   add_i64 tmp2,x3,$0x50
   mov_i64 x3,tmp2

   ---- 00000000004005f8 0000000000000000 0000000000000000
   mov_i64 tmp2,$0x55c0ff21c4b0
   ld_i32 tmp0,env,$0xffffffffffffffa8
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
   mov_i64 x4,$0x405000

   ---- 00000000004005fc 0000000000000000 0000000000000000
   mov_i64 tmp2,$0x55c0ff21c590
   ld_i32 tmp0,env,$0xffffffffffffffa8
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
   add_i64 tmp2,x4,$0x110
   mov_i64 x4,tmp2

   ---- 0000000000400600 0000000000000000 0000000000000000
   mov_i64 tmp2,$0x55c0ff217cd0
   st_i64 tmp2,env,$0xffffffffffffff90
   mov_i64 tmp2,$0x55c0ff21c670
   ld_i32 tmp0,env,$0xffffffffffffffa8
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,tmp2
   mov_i64 lr,$0x400604
   mov_i64 pc,$0x4049a0
   call lookup_tb_ptr,$0x6,$1,tmp2,env
   goto_ptr tmp2
   set_label $L0
   exit_tb $0x7f4b64000043

  OP after optimization and liveness analysis:
   ld_i32 tmp0,env,$0xfffffffffffffff0      pref=0xffff
   brcond_i32 tmp0,$0x0,lt,$L0              dead: 0 1

   ---- 00000000004005d0 0000000000000000 0000000000000000
   ld_i32 tmp0,env,$0xffffffffffffffa8      pref=0x80
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff203430  dead: 0 1
   mov_i64 x29,$0x0                         sync: 0  dead: 0  pref=0xffff

   ---- 00000000004005d4 0000000000000000 0000000000000000
   ld_i32 tmp0,env,$0xffffffffffffffa8      pref=0x80
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff202800  dead: 0 1
   mov_i64 lr,$0x0                          sync: 0  dead: 0  pref=0xffff

   ---- 00000000004005d8 0000000000000000 0000000000000000
   ld_i32 tmp0,env,$0xffffffffffffffa8      pref=0x80
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff203400  dead: 0 1
   mov_i64 x5,x0                            sync: 0  dead: 0 1  pref=0xffff

   ---- 00000000004005dc 0000000000000000 0000000000000f06
   ld_i32 tmp0,env,$0xffffffffffffffa8      pref=0x80
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff1a6150  dead: 0 1
   shl_i64 tmp4,sp,$0x8                     pref=0xffff
   sar_i64 tmp4,tmp4,$0x8                   dead: 1  pref=0xffff
   and_i64 tmp4,tmp4,sp                     dead: 1  pref=0xffff
   mov_i64 tmp7,tmp4                        dead: 1  pref=0xf038
   qemu_ld_i64 x1,tmp7,leq,0                sync: 0  dead: 0  pref=0xffff
   ld_i32 tmp0,env,$0xffffffffffffffa8      pref=0x80
   mov_i64 tmp10,tmp7                       dead: 1  pref=0x4
   call plugin(0x7f4b71c14388),$0x1,$0,tmp0,$0x10030,tmp10,$0x0  dead: 0 1 2 3

   ---- 00000000004005e0 0000000000000000 0000000000000000
   ld_i32 tmp0,env,$0xffffffffffffffa8      pref=0x80
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff1fa4e0  dead: 0 1
   add_i64 tmp2,sp,$0x8                     dead: 2  pref=0xffff
   mov_i64 x2,tmp2                          sync: 0  dead: 0 1  pref=0xffff

   ---- 00000000004005e4 0000000000000000 0000000000000000
   ld_i32 tmp0,env,$0xffffffffffffffa8      pref=0x80
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff193500  dead: 0 1
   mov_i64 x6,sp                            sync: 0  dead: 0 1  pref=0xffff

   ---- 00000000004005e8 0000000000000000 0000000000000000
   ld_i32 tmp0,env,$0xffffffffffffffa8      pref=0x80
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff219700  dead: 0 1

   ---- 00000000004005ec 0000000000000000 0000000000000000
   ld_i32 tmp0,env,$0xffffffffffffffa8      pref=0x80
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff21d160  dead: 0 1
   mov_i64 x0,$0x400608                     sync: 0  dead: 0 1  pref=0xffff

   ---- 00000000004005f0 0000000000000000 0000000000000000
   ld_i32 tmp0,env,$0xffffffffffffffa8      pref=0x80
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff217f80  dead: 0 1

   ---- 00000000004005f4 0000000000000000 0000000000000000
   ld_i32 tmp0,env,$0xffffffffffffffa8      pref=0x80
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff2180c0  dead: 0 1
   mov_i64 x3,$0x405050                     sync: 0  dead: 0 1  pref=0xffff

   ---- 00000000004005f8 0000000000000000 0000000000000000
   ld_i32 tmp0,env,$0xffffffffffffffa8      pref=0x80
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff21c4b0  dead: 0 1

   ---- 00000000004005fc 0000000000000000 0000000000000000
   ld_i32 tmp0,env,$0xffffffffffffffa8      pref=0x80
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff21c590  dead: 0 1
   mov_i64 x4,$0x405110                     sync: 0  dead: 0 1  pref=0xffff

   ---- 0000000000400600 0000000000000000 0000000000000000
   st_i64 $0x55c0ff217cd0,env,$0xffffffffffffff90  dead: 0
   ld_i32 tmp0,env,$0xffffffffffffffa8      pref=0x80
   call plugin(0x7f4b71c1449f),$0x1,$0,tmp0,$0x55c0ff21c670  dead: 0 1
   mov_i64 lr,$0x400604                     sync: 0  dead: 0 1  pref=0xffff
   mov_i64 pc,$0x4049a0                     sync: 0  dead: 0 1  pref=0xffff
   call lookup_tb_ptr,$0x6,$1,tmp2,env      dead: 1  pref=none
   goto_ptr tmp2                            dead: 0
   set_label $L0                          
   exit_tb $0x7f4b64000043                

  OUT: [size=432]
    -- guest addr 0x00000000004005d0 + tb prologue
  0x7f4b64000100:  8b 5d f0                 movl     -0x10(%rbp), %ebx
  0x7f4b64000103:  85 db                    testl    %ebx, %ebx
  0x7f4b64000105:  0f 8c 8a 01 00 00        jl       0x7f4b64000295
  0x7f4b6400010b:  8b 7d a8                 movl     -0x58(%rbp), %edi
  0x7f4b6400010e:  48 be 30 34 20 ff c0 55  movabsq  $0x55c0ff203430, %rsi
  0x7f4b64000116:  00 00
  0x7f4b64000118:  e8 82 43 c1 0d           callq    0x7f4b71c1449f
  0x7f4b6400011d:  48 c7 85 28 01 00 00 00  movq     $0, 0x128(%rbp)
  0x7f4b64000125:  00 00 00
    -- guest addr 0x00000000004005d4
  0x7f4b64000128:  8b 7d a8                 movl     -0x58(%rbp), %edi
  0x7f4b6400012b:  48 be 00 28 20 ff c0 55  movabsq  $0x55c0ff202800, %rsi
  0x7f4b64000133:  00 00
  0x7f4b64000135:  e8 65 43 c1 0d           callq    0x7f4b71c1449f
  0x7f4b6400013a:  48 c7 85 30 01 00 00 00  movq     $0, 0x130(%rbp)
  0x7f4b64000142:  00 00 00
    -- guest addr 0x00000000004005d8
  0x7f4b64000145:  8b 7d a8                 movl     -0x58(%rbp), %edi
  0x7f4b64000148:  48 be 00 34 20 ff c0 55  movabsq  $0x55c0ff203400, %rsi
  0x7f4b64000150:  00 00
  0x7f4b64000152:  e8 48 43 c1 0d           callq    0x7f4b71c1449f
  0x7f4b64000157:  48 8b 5d 40              movq     0x40(%rbp), %rbx
  0x7f4b6400015b:  48 89 5d 68              movq     %rbx, 0x68(%rbp)
    -- guest addr 0x00000000004005dc
  0x7f4b6400015f:  8b 7d a8                 movl     -0x58(%rbp), %edi
  0x7f4b64000162:  48 be 50 61 1a ff c0 55  movabsq  $0x55c0ff1a6150, %rsi
  0x7f4b6400016a:  00 00
  0x7f4b6400016c:  e8 2e 43 c1 0d           callq    0x7f4b71c1449f
  0x7f4b64000171:  48 8b 9d 38 01 00 00     movq     0x138(%rbp), %rbx
  0x7f4b64000178:  4c 8b e3                 movq     %rbx, %r12
  0x7f4b6400017b:  49 c1 e4 08              shlq     $8, %r12
  0x7f4b6400017f:  49 c1 fc 08              sarq     $8, %r12
  0x7f4b64000183:  4c 23 e3                 andq     %rbx, %r12
  0x7f4b64000186:  4d 8b 2c 24              movq     (%r12), %r13
  0x7f4b6400018a:  4c 89 6d 48              movq     %r13, 0x48(%rbp)
  0x7f4b6400018e:  8b 7d a8                 movl     -0x58(%rbp), %edi
  0x7f4b64000191:  be 30 00 01 00           movl     $0x10030, %esi
  0x7f4b64000196:  49 8b d4                 movq     %r12, %rdx
  0x7f4b64000199:  33 c9                    xorl     %ecx, %ecx
  0x7f4b6400019b:  e8 e8 41 c1 0d           callq    0x7f4b71c14388
    -- guest addr 0x00000000004005e0
  0x7f4b640001a0:  8b 7d a8                 movl     -0x58(%rbp), %edi
  0x7f4b640001a3:  48 be e0 a4 1f ff c0 55  movabsq  $0x55c0ff1fa4e0, %rsi
  0x7f4b640001ab:  00 00
  0x7f4b640001ad:  e8 ed 42 c1 0d           callq    0x7f4b71c1449f
  0x7f4b640001b2:  4c 8d 63 08              leaq     8(%rbx), %r12
  0x7f4b640001b6:  4c 89 65 50              movq     %r12, 0x50(%rbp)
    -- guest addr 0x00000000004005e4
  0x7f4b640001ba:  8b 7d a8                 movl     -0x58(%rbp), %edi
  0x7f4b640001bd:  48 be 00 35 19 ff c0 55  movabsq  $0x55c0ff193500, %rsi
  0x7f4b640001c5:  00 00
  0x7f4b640001c7:  e8 d3 42 c1 0d           callq    0x7f4b71c1449f
  0x7f4b640001cc:  48 89 5d 70              movq     %rbx, 0x70(%rbp)
    -- guest addr 0x00000000004005e8
  0x7f4b640001d0:  8b 7d a8                 movl     -0x58(%rbp), %edi
  0x7f4b640001d3:  48 be 00 97 21 ff c0 55  movabsq  $0x55c0ff219700, %rsi
  0x7f4b640001db:  00 00
  0x7f4b640001dd:  e8 bd 42 c1 0d           callq    0x7f4b71c1449f
    -- guest addr 0x00000000004005ec
  0x7f4b640001e2:  8b 7d a8                 movl     -0x58(%rbp), %edi
  0x7f4b640001e5:  48 be 60 d1 21 ff c0 55  movabsq  $0x55c0ff21d160, %rsi
  0x7f4b640001ed:  00 00
  0x7f4b640001ef:  e8 ab 42 c1 0d           callq    0x7f4b71c1449f
  0x7f4b640001f4:  48 c7 45 40 08 06 40 00  movq     $0x400608, 0x40(%rbp)
    -- guest addr 0x00000000004005f0
  0x7f4b640001fc:  8b 7d a8                 movl     -0x58(%rbp), %edi
  0x7f4b640001ff:  48 be 80 7f 21 ff c0 55  movabsq  $0x55c0ff217f80, %rsi
  0x7f4b64000207:  00 00
  0x7f4b64000209:  e8 91 42 c1 0d           callq    0x7f4b71c1449f
    -- guest addr 0x00000000004005f4
  0x7f4b6400020e:  8b 7d a8                 movl     -0x58(%rbp), %edi
  0x7f4b64000211:  48 be c0 80 21 ff c0 55  movabsq  $0x55c0ff2180c0, %rsi
  0x7f4b64000219:  00 00
  0x7f4b6400021b:  e8 7f 42 c1 0d           callq    0x7f4b71c1449f
  0x7f4b64000220:  48 c7 45 58 50 50 40 00  movq     $0x405050, 0x58(%rbp)
    -- guest addr 0x00000000004005f8
  0x7f4b64000228:  8b 7d a8                 movl     -0x58(%rbp), %edi
  0x7f4b6400022b:  48 be b0 c4 21 ff c0 55  movabsq  $0x55c0ff21c4b0, %rsi
  0x7f4b64000233:  00 00
  0x7f4b64000235:  e8 65 42 c1 0d           callq    0x7f4b71c1449f
    -- guest addr 0x00000000004005fc
  0x7f4b6400023a:  8b 7d a8                 movl     -0x58(%rbp), %edi
  0x7f4b6400023d:  48 be 90 c5 21 ff c0 55  movabsq  $0x55c0ff21c590, %rsi
  0x7f4b64000245:  00 00
  0x7f4b64000247:  e8 53 42 c1 0d           callq    0x7f4b71c1449f
  0x7f4b6400024c:  48 c7 45 60 10 51 40 00  movq     $0x405110, 0x60(%rbp)
    -- guest addr 0x0000000000400600
  0x7f4b64000254:  48 bb d0 7c 21 ff c0 55  movabsq  $0x55c0ff217cd0, %rbx
  0x7f4b6400025c:  00 00
  0x7f4b6400025e:  48 89 5d 90              movq     %rbx, -0x70(%rbp)
  0x7f4b64000262:  8b 7d a8                 movl     -0x58(%rbp), %edi
  0x7f4b64000265:  48 be 70 c6 21 ff c0 55  movabsq  $0x55c0ff21c670, %rsi
  0x7f4b6400026d:  00 00
  0x7f4b6400026f:  e8 2b 42 c1 0d           callq    0x7f4b71c1449f
  0x7f4b64000274:  48 c7 85 30 01 00 00 04  movq     $0x400604, 0x130(%rbp)
  0x7f4b6400027c:  06 40 00
  0x7f4b6400027f:  48 c7 85 40 01 00 00 a0  movq     $0x4049a0, 0x140(%rbp)
  0x7f4b64000287:  49 40 00
  0x7f4b6400028a:  48 8b fd                 movq     %rbp, %rdi
  0x7f4b6400028d:  ff 15 15 00 00 00        callq    *0x15(%rip)
  0x7f4b64000293:  ff e0                    jmpq     *%rax
  0x7f4b64000295:  48 8d 05 a7 fd ff ff     leaq     -0x259(%rip), %rax
  0x7f4b6400029c:  e9 77 fd ff ff           jmp      0x7f4b64000018
    -- tb slow paths + alignment
  0x7f4b640002a1:  90                       nop      
  0x7f4b640002a2:  90                       nop      
  0x7f4b640002a3:  90                       nop      
  0x7f4b640002a4:  90                       nop      
  0x7f4b640002a5:  90                       nop      
  0x7f4b640002a6:  90                       nop      
  0x7f4b640002a7:  90                       nop      
    data: [size=8]
  0x7f4b640002a8:  .quad  0x000055c0feba1d00

  0, 0x4005d0, 0xd280001d, "movz x29, #0"
  0, 0x4005d4, 0xd280001e, "movz x30, #0"
  0, 0x4005d8, 0xaa0003e5, "mov x5, x0"
  0, 0x4005dc, 0xf94003e1, "ldr x1, [sp]", load, 0x55008000f0
  0, 0x4005e0, 0x910023e2, "add x2, sp, #8"
  0, 0x4005e4, 0x910003e6, "mov x6, sp"
  0, 0x4005e8, 0x90000000, "adrp x0, #0x400000"
  0, 0x4005ec, 0x91182000, "add x0, x0, #0x608"
  0, 0x4005f0, 0xb0000023, "adrp x3, #0x405000"
  0, 0x4005f4, 0x91014063, "add x3, x3, #0x50"
  0, 0x4005f8, 0xb0000024, "adrp x4, #0x405000"
  0, 0x4005fc, 0x91044084, "add x4, x4, #0x110"
  cpu_tb_exec: 0
  **
  ERROR:../../accel/tcg/cpu-exec.c:443:cpu_tb_exec: code should not be reached
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped



>> 
>> When digging through my other failure in `rr` I saw the cpu->plugin_mem_cbs
>> pointer changing from one non-null value to another (which also seems to
>> indicate it is not being cleared between instructions).
>> 
>> Does this hint that there are cases where reset cpu->plugin_mem_cbs to NULL is
>> getting optimized away, but not the code to set it in the first place?
>
> Is there anyone who could help take a look at this from the code gen
> perspective?
>
> -Aaron
Emilio Cota Dec. 18, 2022, 5:24 a.m. UTC | #12
On Tue, Nov 29, 2022 at 15:37:51 -0500, Aaron Lindsay wrote:
(snip)
> > Does this hint that there are cases where reset cpu->plugin_mem_cbs to NULL is
> > getting optimized away, but not the code to set it in the first place?
> 
> Is there anyone who could help take a look at this from the code gen
> perspective?

Thanks for the report. Just adding assertions was enough to uncover
several bugs. I did not reproduce the use-after-free, but by calling
reset from a callback it's easy to see how it can occur.

I have fixes in https://github.com/cota/qemu/tree/plugins

Can you please give those a try?

BTW I created an issue on gitlab to track this
  https://gitlab.com/qemu-project/qemu/-/issues/1381

Thanks,
		Emilio
Aaron Lindsay Dec. 19, 2022, 8:11 p.m. UTC | #13
Emilio,

On Dec 18 00:24, Emilio Cota wrote:
> On Tue, Nov 29, 2022 at 15:37:51 -0500, Aaron Lindsay wrote:
> (snip)
> > > Does this hint that there are cases where reset cpu->plugin_mem_cbs to NULL is
> > > getting optimized away, but not the code to set it in the first place?
> > 
> > Is there anyone who could help take a look at this from the code gen
> > perspective?
> 
> Thanks for the report. Just adding assertions was enough to uncover
> several bugs. I did not reproduce the use-after-free, but by calling
> reset from a callback it's easy to see how it can occur.
> 
> I have fixes in https://github.com/cota/qemu/tree/plugins
> 
> Can you please give those a try?
> 
> BTW I created an issue on gitlab to track this
>   https://gitlab.com/qemu-project/qemu/-/issues/1381

Thanks so much for digging into this!

I rebased your plugins branch on top of v7.2.0 and tested with several
scenarios which reliably triggered the bug for me. None of them
reproduced the original problem (or hit other bugs!) with your fixes.

-Aaron
Alex Bennée Jan. 6, 2023, 10:30 a.m. UTC | #14
Aaron Lindsay <aaron@os.amperecomputing.com> writes:

> Emilio,
>
> On Dec 18 00:24, Emilio Cota wrote:
>> On Tue, Nov 29, 2022 at 15:37:51 -0500, Aaron Lindsay wrote:
>> (snip)
>> > > Does this hint that there are cases where reset cpu->plugin_mem_cbs to NULL is
>> > > getting optimized away, but not the code to set it in the first place?
>> > 
>> > Is there anyone who could help take a look at this from the code gen
>> > perspective?
>> 
>> Thanks for the report. Just adding assertions was enough to uncover
>> several bugs. I did not reproduce the use-after-free, but by calling
>> reset from a callback it's easy to see how it can occur.
>> 
>> I have fixes in https://github.com/cota/qemu/tree/plugins
>> 
>> Can you please give those a try?
>> 
>> BTW I created an issue on gitlab to track this
>>   https://gitlab.com/qemu-project/qemu/-/issues/1381
>
> Thanks so much for digging into this!
>
> I rebased your plugins branch on top of v7.2.0 and tested with several
> scenarios which reliably triggered the bug for me. None of them
> reproduced the original problem (or hit other bugs!) with your fixes.

Emilio,

Are you going to be able to post the patches soon? I'd like to get the
fixes in as early in the cycle as possible.

Thanks,
Emilio Cota Jan. 7, 2023, 3:07 a.m. UTC | #15
On Fri, Jan 6, 2023, 5:31 AM Alex Bennée <alex.bennee@linaro.org> wrote:

> Are you going to be able to post the patches soon? I'd like to get the
> fixes in as early in the cycle as possible.
>

I intend to post this series on Sunday.

Thanks,
Emilio
diff mbox series

Patch

diff --git a/plugins/core.c b/plugins/core.c
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -427,9 +427,14 @@  static bool free_dyn_cb_arr(void *p, uint32_t h, void *userp)

 void qemu_plugin_flush_cb(void)
 {
+    CPUState *cpu;
     qht_iter_remove(&plugin.dyn_cb_arr_ht, free_dyn_cb_arr, NULL);
     qht_reset(&plugin.dyn_cb_arr_ht);

+    CPU_FOREACH(cpu) {
+        cpu->plugin_mem_cbs = NULL;
+    }
+
     plugin_cb__simple(QEMU_PLUGIN_EV_FLUSH);
 }