diff mbox series

RISC-V: Add support for Ztso

Message ID 20220902034412.8918-1-palmer@rivosinc.com (mailing list archive)
State New, archived
Headers show
Series RISC-V: Add support for Ztso | expand

Commit Message

Palmer Dabbelt Sept. 2, 2022, 3:44 a.m. UTC
Ztso, the RISC-V extension that provides the TSO memory model, was
recently frozen.  This provides support for Ztso on targets that are
themselves TSO.

Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

---

My first thought was to just add the TCG barries to load/store and AMOs
that as defined by Ztso, but after poking around a bit it seems that's
frowned upon by check_tcg_memory_orders_compatible().  I feel like the
indicated performance issues could probably be worked out, but this is
about the same amount of code and doesn't suffer from those performance
issues.  That said, it just seems wrong to couple targets to a RISC-V
feature.

This is also essentially un-tested, aside from poking around in the
generated device tree to make sure "_ztso" shows up when enabled.  I
don't think there's really any way to test it further, as we don't have
any TSO-enabled workloads and we were defacto providing TSO already on
x86 targets (which I'm assuming are what the vast majority of users are
running).
---
 target/riscv/cpu.c       | 12 ++++++++++++
 target/riscv/cpu.h       | 16 +++++++++++++++-
 target/riscv/translate.c |  6 ++++++
 tcg/i386/tcg-target.h    |  1 +
 tcg/s390x/tcg-target.h   |  1 +
 5 files changed, 35 insertions(+), 1 deletion(-)

Comments

Richard Henderson Sept. 4, 2022, 12:47 a.m. UTC | #1
On 9/2/22 04:44, Palmer Dabbelt wrote:
> -#define TCG_GUEST_DEFAULT_MO 0
> +/*
> + * RISC-V has two memory models: TSO is a bit weaker than Intel (MMIO and
> + * fetch), and WMO is approximately equivilant to Arm MCA.  Rather than
> + * enforcing orderings on most accesses, just default to the target memory
> + * order.
> + */
> +#ifdef TCG_TARGET_SUPPORTS_MCTCG_RVTSO
> +# define TCG_GUEST_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
> +#else
> +# define TCG_GUEST_DEFAULT_MO (0)
> +#endif

TCG_GUEST_DEFAULT_MO should be allowed to be variable.  Since I've not tried that, it may 
not work, but making sure that it does would be the first thing to do.

> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -236,6 +236,7 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
>  #include "tcg/tcg-mo.h"
>  
>  #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
> +#define TCG_TARGET_SUPPORTS_MCTCG_RVTSO 1

Um, no.  There's no need for this hackery...

> +#ifdef TCG_TARGET_SUPPORTS_MCTCG_RVTSO
> +    /*
> +     * We only support Ztso on targets that themselves are already TSO, which
> +     * means there's no way to provide just RVWMO on those targets.  Instead
> +     * just default to telling the guest that Ztso is enabled.:
> +     */
> +    DEFINE_PROP_BOOL("ztso", RISCVCPU, cfg.ext_ztso, true),
> +#endif

... you can just as well define the property at runtime, with a runtime check on 
TCG_TARGET_DEFAULT_MO.

Though, honestly, I've had patches to add the required barriers sitting around for the 
last few releases, to better support things like x86 on aarch64.  I should just finish 
that up.


r~
Palmer Dabbelt Sept. 16, 2022, 12:52 p.m. UTC | #2
On Sat, 03 Sep 2022 17:47:54 PDT (-0700), richard.henderson@linaro.org wrote:
> On 9/2/22 04:44, Palmer Dabbelt wrote:
>> -#define TCG_GUEST_DEFAULT_MO 0
>> +/*
>> + * RISC-V has two memory models: TSO is a bit weaker than Intel (MMIO and
>> + * fetch), and WMO is approximately equivilant to Arm MCA.  Rather than
>> + * enforcing orderings on most accesses, just default to the target memory
>> + * order.
>> + */
>> +#ifdef TCG_TARGET_SUPPORTS_MCTCG_RVTSO
>> +# define TCG_GUEST_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
>> +#else
>> +# define TCG_GUEST_DEFAULT_MO (0)
>> +#endif
>
> TCG_GUEST_DEFAULT_MO should be allowed to be variable.  Since I've not tried that, it may
> not work, but making sure that it does would be the first thing to do.
>
>> --- a/tcg/i386/tcg-target.h
>> +++ b/tcg/i386/tcg-target.h
>> @@ -236,6 +236,7 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
>>  #include "tcg/tcg-mo.h"
>>
>>  #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
>> +#define TCG_TARGET_SUPPORTS_MCTCG_RVTSO 1
>
> Um, no.  There's no need for this hackery...
>
>> +#ifdef TCG_TARGET_SUPPORTS_MCTCG_RVTSO
>> +    /*
>> +     * We only support Ztso on targets that themselves are already TSO, which
>> +     * means there's no way to provide just RVWMO on those targets.  Instead
>> +     * just default to telling the guest that Ztso is enabled.:
>> +     */
>> +    DEFINE_PROP_BOOL("ztso", RISCVCPU, cfg.ext_ztso, true),
>> +#endif
>
> ... you can just as well define the property at runtime, with a runtime check on
> TCG_TARGET_DEFAULT_MO.
>
> Though, honestly, I've had patches to add the required barriers sitting around for the
> last few releases, to better support things like x86 on aarch64.  I should just finish
> that up.

I can just do that for the RISC-V TSO support?  Like the cover letter 
says that was my first thought, it's only when I found the comment 
saying not to do it that I went this way.

>
>
> r~
Richard Henderson Sept. 17, 2022, 8:02 a.m. UTC | #3
On 9/16/22 14:52, Palmer Dabbelt wrote:
>> Though, honestly, I've had patches to add the required barriers sitting around for the
>> last few releases, to better support things like x86 on aarch64.  I should just finish
>> that up.
> 
> I can just do that for the RISC-V TSO support?  Like the cover letter says that was my 
> first thought, it's only when I found the comment saying not to do it that I went this way.

My patches inject the barriers automatically by the tcg optimizer, rather than by hand, 
which is what the comment was trying to discourage.  Last version was

https://lore.kernel.org/qemu-devel/20210316220735.2048137-1-richard.henderson@linaro.org/


r~
Palmer Dabbelt Sept. 17, 2022, 8:22 a.m. UTC | #4
On Sat, 17 Sep 2022 01:02:46 PDT (-0700), Richard Henderson wrote:
> On 9/16/22 14:52, Palmer Dabbelt wrote:
>>> Though, honestly, I've had patches to add the required barriers sitting around for the
>>> last few releases, to better support things like x86 on aarch64.  I should just finish
>>> that up.
>>
>> I can just do that for the RISC-V TSO support?  Like the cover letter says that was my
>> first thought, it's only when I found the comment saying not to do it that I went this way.
>
> My patches inject the barriers automatically by the tcg optimizer, rather than by hand,
> which is what the comment was trying to discourage.  Last version was
>
> https://lore.kernel.org/qemu-devel/20210316220735.2048137-1-richard.henderson@linaro.org/

Thanks, I get it now.
Dr. David Alan Gilbert Sept. 29, 2022, 7:16 p.m. UTC | #5
* Palmer Dabbelt (palmer@rivosinc.com) wrote:
> Ztso, the RISC-V extension that provides the TSO memory model, was
> recently frozen.  This provides support for Ztso on targets that are
> themselves TSO.
> 
> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
> 
> ---
> 

> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> index 00fcbe297d..2a43d54fcd 100644
> --- a/tcg/i386/tcg-target.h
> +++ b/tcg/i386/tcg-target.h
> @@ -236,6 +236,7 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
>  #include "tcg/tcg-mo.h"
>  
>  #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
> +#define TCG_TARGET_SUPPORTS_MCTCG_RVTSO 1

Is x86's brand of memory ordering strong enough for Ztso?
I thought x86 had an optimisation where it was allowed to store forward
within the current CPU causing stores not to be quite strictly ordered.

Dave

>  #define TCG_TARGET_HAS_MEMORY_BSWAP  have_movbe
>  
> diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
> index 23e2063667..f423c124a0 100644
> --- a/tcg/s390x/tcg-target.h
> +++ b/tcg/s390x/tcg-target.h
> @@ -171,6 +171,7 @@ extern uint64_t s390_facilities[3];
>  #define TCG_TARGET_HAS_MEMORY_BSWAP   1
>  
>  #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
> +#define TCG_TARGET_SUPPORTS_MCTCG_RVTSO 1
>  
>  static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
>                                              uintptr_t jmp_rw, uintptr_t addr)
> -- 
> 2.34.1
> 
>
Palmer Dabbelt Oct. 2, 2022, 9:20 p.m. UTC | #6
On Thu, 29 Sep 2022 12:16:48 PDT (-0700), dgilbert@redhat.com wrote:
> * Palmer Dabbelt (palmer@rivosinc.com) wrote:
>> Ztso, the RISC-V extension that provides the TSO memory model, was
>> recently frozen.  This provides support for Ztso on targets that are
>> themselves TSO.
>>
>> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
>>
>> ---
>>
>
>> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
>> index 00fcbe297d..2a43d54fcd 100644
>> --- a/tcg/i386/tcg-target.h
>> +++ b/tcg/i386/tcg-target.h
>> @@ -236,6 +236,7 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
>>  #include "tcg/tcg-mo.h"
>>
>>  #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
>> +#define TCG_TARGET_SUPPORTS_MCTCG_RVTSO 1
>
> Is x86's brand of memory ordering strong enough for Ztso?
> I thought x86 had an optimisation where it was allowed to store forward
> within the current CPU causing stores not to be quite strictly ordered.

I'm actually not sure: my understanding of the Intel memory model was 
that there's a bunch of subtle bits that don't match the various TSO 
formalizations, but the RISC-V folks are pretty adamant that Intel is 
exactly TSO.  I've gotten yelled at enough times on this one that I kind 
of just stopped caring, but that's not a good reason to have broken code 
so I'm happy to go fix it.

That said, when putting together the v2 (which has TCG barriers in the 
RISC-V front-end) I couldn't even really figure out how the TCG memory 
model works in any formal capacity -- I essentially just added the 
fences necessary for Ztso on RVWMO, but that's not a good proxy for Ztso 
on arm64 (and I guess not on x86, either).  Also happy to go take a 
crack at that one, but I'm not really a formal memory model person so it 
might not be the best result.

>
> Dave
>
>>  #define TCG_TARGET_HAS_MEMORY_BSWAP  have_movbe
>>
>> diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
>> index 23e2063667..f423c124a0 100644
>> --- a/tcg/s390x/tcg-target.h
>> +++ b/tcg/s390x/tcg-target.h
>> @@ -171,6 +171,7 @@ extern uint64_t s390_facilities[3];
>>  #define TCG_TARGET_HAS_MEMORY_BSWAP   1
>>
>>  #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
>> +#define TCG_TARGET_SUPPORTS_MCTCG_RVTSO 1
>>
>>  static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
>>                                              uintptr_t jmp_rw, uintptr_t addr)
>> --
>> 2.34.1
>>
>>
Dr. David Alan Gilbert Oct. 3, 2022, 8:44 a.m. UTC | #7
* Palmer Dabbelt (palmer@rivosinc.com) wrote:
> On Thu, 29 Sep 2022 12:16:48 PDT (-0700), dgilbert@redhat.com wrote:
> > * Palmer Dabbelt (palmer@rivosinc.com) wrote:
> > > Ztso, the RISC-V extension that provides the TSO memory model, was
> > > recently frozen.  This provides support for Ztso on targets that are
> > > themselves TSO.
> > > 
> > > Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
> > > 
> > > ---
> > > 
> > 
> > > diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
> > > index 00fcbe297d..2a43d54fcd 100644
> > > --- a/tcg/i386/tcg-target.h
> > > +++ b/tcg/i386/tcg-target.h
> > > @@ -236,6 +236,7 @@ static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
> > >  #include "tcg/tcg-mo.h"
> > > 
> > >  #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
> > > +#define TCG_TARGET_SUPPORTS_MCTCG_RVTSO 1
> > 
> > Is x86's brand of memory ordering strong enough for Ztso?
> > I thought x86 had an optimisation where it was allowed to store forward
> > within the current CPU causing stores not to be quite strictly ordered.
> 
> I'm actually not sure: my understanding of the Intel memory model was that
> there's a bunch of subtle bits that don't match the various TSO
> formalizations, but the RISC-V folks are pretty adamant that Intel is
> exactly TSO.  I've gotten yelled at enough times on this one that I kind of
> just stopped caring, but that's not a good reason to have broken code so I'm
> happy to go fix it.

Many people make that mistake, please refer them to the Intel docs; the
big 'Intel 64 and IA-32 Architecture Software Developer's Manual,
Combined Volumes: 1,2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D and 4'; in the recent
version I've got (April 2022) section 8.2 covers memory ordering and
8.2.2 Memory Ordering in P6 and More Recent Processor Families says on
page 8-7 (page 3090 ish):

  In a multiple-processor system, the following ordering principles apply:
....
  Writes from an individual processor are NOT ordered with respect to the writes from other processors.
....
  Any two stores are seen in a consistent order by processors other than those performing the stores

then a bit further down, '8.2.3.5 Intra-Processor Forwarding Is Allowed'
has an example and says

    'The memory-ordering model allows concurrent stores by two processors to be seen in
    different orders by those two processors; specifically, each processor may perceive
    its own store occurring before that of the other.'

Having said that, I remember it's realyl difficult to trigger; it's ~10
years since I saw an example to trigger it, and can't remember it.

> That said, when putting together the v2 (which has TCG barriers in the
> RISC-V front-end) I couldn't even really figure out how the TCG memory model
> works in any formal capacity -- I essentially just added the fences
> necessary for Ztso on RVWMO, but that's not a good proxy for Ztso on arm64
> (and I guess not on x86, either).  Also happy to go take a crack at that
> one, but I'm not really a formal memory model person so it might not be the
> best result.

Oh I don't know TCG's model, copying in Alex.

Dave

> > 
> > Dave
> > 
> > >  #define TCG_TARGET_HAS_MEMORY_BSWAP  have_movbe
> > > 
> > > diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
> > > index 23e2063667..f423c124a0 100644
> > > --- a/tcg/s390x/tcg-target.h
> > > +++ b/tcg/s390x/tcg-target.h
> > > @@ -171,6 +171,7 @@ extern uint64_t s390_facilities[3];
> > >  #define TCG_TARGET_HAS_MEMORY_BSWAP   1
> > > 
> > >  #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
> > > +#define TCG_TARGET_SUPPORTS_MCTCG_RVTSO 1
> > > 
> > >  static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
> > >                                              uintptr_t jmp_rw, uintptr_t addr)
> > > --
> > > 2.34.1
> > > 
> > > 
>
Andrea Parri Oct. 13, 2022, 9:18 a.m. UTC | #8
> > > Is x86's brand of memory ordering strong enough for Ztso?
> > > I thought x86 had an optimisation where it was allowed to store forward
> > > within the current CPU causing stores not to be quite strictly ordered.

[...]

> then a bit further down, '8.2.3.5 Intra-Processor Forwarding Is Allowed'
> has an example and says
> 
>     'The memory-ordering model allows concurrent stores by two processors to be seen in
>     different orders by those two processors; specifically, each processor may perceive
>     its own store occurring before that of the other.'
> 
> Having said that, I remember it's realyl difficult to trigger; it's ~10
> years since I saw an example to trigger it, and can't remember it.

AFAICT, Ztso allows the forwarding in question too.  Simulations with
the axiomatic formalization confirm such expectation:

RISCV intra-processor-forwarding
{
0:x5=1; 0:x6=x; 0:x8=y;
1:x5=1; 1:x6=y; 1:x8=x;
}
 P0          | P1          ;
 sw x5,0(x6) | sw x5,0(x6) ;
 lw x9,0(x6) | lw x9,0(x6) ;
 lw x7,0(x8) | lw x7,0(x8) ;
exists
(0:x7=0 /\ 1:x7=0 /\ 0:x9=1 /\ 1:x9=1)

Test intra-processor-forwarding Allowed
States 4
0:x7=0; 0:x9=1; 1:x7=0; 1:x9=1;
0:x7=0; 0:x9=1; 1:x7=1; 1:x9=1;
0:x7=1; 0:x9=1; 1:x7=0; 1:x9=1;
0:x7=1; 0:x9=1; 1:x7=1; 1:x9=1;
Ok
Witnesses
Positive: 1 Negative: 3
Condition exists (0:x7=0 /\ 1:x7=0 /\ 0:x9=1 /\ 1:x9=1)
Observation intra-processor-forwarding Sometimes 1 3
Time intra-processor-forwarding 0.00
Hash=518e4b9b2f0770c94918ac5d7e311ba5

  Andrea
Dr. David Alan Gilbert Oct. 13, 2022, 9:59 a.m. UTC | #9
* Andrea Parri (andrea@rivosinc.com) wrote:
> > > > Is x86's brand of memory ordering strong enough for Ztso?
> > > > I thought x86 had an optimisation where it was allowed to store forward
> > > > within the current CPU causing stores not to be quite strictly ordered.
> 
> [...]
> 
> > then a bit further down, '8.2.3.5 Intra-Processor Forwarding Is Allowed'
> > has an example and says
> > 
> >     'The memory-ordering model allows concurrent stores by two processors to be seen in
> >     different orders by those two processors; specifically, each processor may perceive
> >     its own store occurring before that of the other.'
> > 
> > Having said that, I remember it's realyl difficult to trigger; it's ~10
> > years since I saw an example to trigger it, and can't remember it.
> 
> AFAICT, Ztso allows the forwarding in question too.  Simulations with
> the axiomatic formalization confirm such expectation:

OK that seems to be what it says in:
https://five-embeddev.com/riscv-isa-manual/latest/ztso.html
  'In both of these memory models, it is the that allows a hart to
forward a value from its store buffer to a subsequent (in program order)
load—that is to say that stores can be forwarded locally before they are
visible to other harts'

> RISCV intra-processor-forwarding
> {
> 0:x5=1; 0:x6=x; 0:x8=y;
> 1:x5=1; 1:x6=y; 1:x8=x;
> }
>  P0          | P1          ;
>  sw x5,0(x6) | sw x5,0(x6) ;
>  lw x9,0(x6) | lw x9,0(x6) ;
>  lw x7,0(x8) | lw x7,0(x8) ;
> exists
> (0:x7=0 /\ 1:x7=0 /\ 0:x9=1 /\ 1:x9=1)

(I'm a bit fuzzy reading this...)
So is that the interesting case - where x7 is saying neither processor
saw the other processors write yet, but they did see their own?


So from a qemu patch perspective, I think the important thing is that
the flag that's defined, is defined and commented in such a way that
it's obvious that local forwarding is allowed; we wouldn't want someone
emulating a stricter CPU (that doesn't allow local forwarding) to go and
use this flag as an indication that the host cpu is that strict.

Dave

> Test intra-processor-forwarding Allowed
> States 4
> 0:x7=0; 0:x9=1; 1:x7=0; 1:x9=1;
> 0:x7=0; 0:x9=1; 1:x7=1; 1:x9=1;
> 0:x7=1; 0:x9=1; 1:x7=0; 1:x9=1;
> 0:x7=1; 0:x9=1; 1:x7=1; 1:x9=1;
> Ok
> Witnesses
> Positive: 1 Negative: 3
> Condition exists (0:x7=0 /\ 1:x7=0 /\ 0:x9=1 /\ 1:x9=1)
> Observation intra-processor-forwarding Sometimes 1 3
> Time intra-processor-forwarding 0.00
> Hash=518e4b9b2f0770c94918ac5d7e311ba5
> 
>   Andrea
>
Andrea Parri Oct. 13, 2022, 10:25 a.m. UTC | #10
> > AFAICT, Ztso allows the forwarding in question too.  Simulations with
> > the axiomatic formalization confirm such expectation:
> 
> OK that seems to be what it says in:
> https://five-embeddev.com/riscv-isa-manual/latest/ztso.html
>   'In both of these memory models, it is the that allows a hart to
> forward a value from its store buffer to a subsequent (in program order)
> load—that is to say that stores can be forwarded locally before they are
> visible to other harts'

Indeed, thanks for the remark.


> > RISCV intra-processor-forwarding
> > {
> > 0:x5=1; 0:x6=x; 0:x8=y;
> > 1:x5=1; 1:x6=y; 1:x8=x;
> > }
> >  P0          | P1          ;
> >  sw x5,0(x6) | sw x5,0(x6) ;
> >  lw x9,0(x6) | lw x9,0(x6) ;
> >  lw x7,0(x8) | lw x7,0(x8) ;
> > exists
> > (0:x7=0 /\ 1:x7=0 /\ 0:x9=1 /\ 1:x9=1)
> 
> (I'm a bit fuzzy reading this...)
> So is that the interesting case - where x7 is saying neither processor
> saw the other processors write yet, but they did see their own?

Right, it was inspired by the homonymous test in the Intel's specs.

  Andrea
diff mbox series

Patch

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index ac6f82ebd0..d05b8c7c4a 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -919,6 +919,15 @@  static Property riscv_cpu_extensions[] = {
     DEFINE_PROP_BOOL("zhinx", RISCVCPU, cfg.ext_zhinx, false),
     DEFINE_PROP_BOOL("zhinxmin", RISCVCPU, cfg.ext_zhinxmin, false),
 
+#ifdef TCG_TARGET_SUPPORTS_MCTCG_RVTSO
+    /*
+     * We only support Ztso on targets that themselves are already TSO, which
+     * means there's no way to provide just RVWMO on those targets.  Instead
+     * just default to telling the guest that Ztso is enabled.:
+     */
+    DEFINE_PROP_BOOL("ztso", RISCVCPU, cfg.ext_ztso, true),
+#endif
+
     /* Vendor-specific custom extensions */
     DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, false),
 
@@ -1094,6 +1103,9 @@  static void riscv_isa_string_ext(RISCVCPU *cpu, char **isa_str, int max_str_len)
         ISA_EDATA_ENTRY(zksed, ext_zksed),
         ISA_EDATA_ENTRY(zksh, ext_zksh),
         ISA_EDATA_ENTRY(zkt, ext_zkt),
+#ifdef TCG_TARGET_SUPPORTS_MCTCG_RVTSO
+        ISA_EDATA_ENTRY(ztso, ext_ztso),
+#endif
         ISA_EDATA_ENTRY(zve32f, ext_zve32f),
         ISA_EDATA_ENTRY(zve64f, ext_zve64f),
         ISA_EDATA_ENTRY(zhinx, ext_zhinx),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 5c7acc055a..879e11a950 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -27,8 +27,19 @@ 
 #include "qom/object.h"
 #include "qemu/int128.h"
 #include "cpu_bits.h"
+#include "tcg-target.h"
 
-#define TCG_GUEST_DEFAULT_MO 0
+/*
+ * RISC-V has two memory models: TSO is a bit weaker than Intel (MMIO and
+ * fetch), and WMO is approximately equivilant to Arm MCA.  Rather than
+ * enforcing orderings on most accesses, just default to the target memory
+ * order.
+ */
+#ifdef TCG_TARGET_SUPPORTS_MCTCG_RVTSO
+# define TCG_GUEST_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
+#else
+# define TCG_GUEST_DEFAULT_MO (0)
+#endif
 
 /*
  * RISC-V-specific extra insn start words:
@@ -433,6 +444,9 @@  struct RISCVCPUConfig {
     bool ext_zve32f;
     bool ext_zve64f;
     bool ext_zmmul;
+#ifdef TCG_TARGET_SUPPORTS_MCTCG_RVTSO
+    bool ext_ztso;
+#endif
     bool rvv_ta_all_1s;
 
     uint32_t mvendorid;
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 63b04e8a94..00fd75b971 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -109,6 +109,9 @@  typedef struct DisasContext {
     /* PointerMasking extension */
     bool pm_mask_enabled;
     bool pm_base_enabled;
+#ifdef TCG_TARGET_SUPPORTS_MCTCG_RVTSO
+    bool ztso;
+#endif
     /* TCG of the current insn_start */
     TCGOp *insn_start;
 } DisasContext;
@@ -1109,6 +1112,9 @@  static void riscv_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     memset(ctx->ftemp, 0, sizeof(ctx->ftemp));
     ctx->pm_mask_enabled = FIELD_EX32(tb_flags, TB_FLAGS, PM_MASK_ENABLED);
     ctx->pm_base_enabled = FIELD_EX32(tb_flags, TB_FLAGS, PM_BASE_ENABLED);
+#ifdef TCG_TARGET_SUPPORTS_MCTCG_RVTSO
+    ctx->ztso = cpu->cfg.ext_ztso;
+#endif
     ctx->zero = tcg_constant_tl(0);
 }
 
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 00fcbe297d..2a43d54fcd 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -236,6 +236,7 @@  static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
 #include "tcg/tcg-mo.h"
 
 #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
+#define TCG_TARGET_SUPPORTS_MCTCG_RVTSO 1
 
 #define TCG_TARGET_HAS_MEMORY_BSWAP  have_movbe
 
diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index 23e2063667..f423c124a0 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -171,6 +171,7 @@  extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_MEMORY_BSWAP   1
 
 #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
+#define TCG_TARGET_SUPPORTS_MCTCG_RVTSO 1
 
 static inline void tb_target_set_jmp_target(uintptr_t tc_ptr, uintptr_t jmp_rx,
                                             uintptr_t jmp_rw, uintptr_t addr)