diff mbox series

xen: Introduce cmpxchg64() and guest_cmpxchg64()

Message ID 20200815172143.1327-1-julien@xen.org (mailing list archive)
State New, archived
Headers show
Series xen: Introduce cmpxchg64() and guest_cmpxchg64() | expand

Commit Message

Julien Grall Aug. 15, 2020, 5:21 p.m. UTC
From: Julien Grall <jgrall@amazon.com>

The IOREQ code is using cmpxchg() with 64-bit value. At the moment, this
is x86 code, but there is plan to make it common.

To cater 32-bit arch, introduce two new helpers to deal with 64-bit
cmpxchg.

The Arm 32-bit implementation of cmpxchg64() is based on the __cmpxchg64
in Linux v5.8 (arch/arm/include/asm/cmpxchg.h).

Signed-off-by: Julien Grall <jgrall@amazon.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
 xen/include/asm-arm/arm32/cmpxchg.h | 68 +++++++++++++++++++++++++++++
 xen/include/asm-arm/arm64/cmpxchg.h |  5 +++
 xen/include/asm-arm/guest_atomics.h | 22 ++++++++++
 xen/include/asm-x86/guest_atomics.h |  2 +
 xen/include/asm-x86/x86_64/system.h |  2 +
 5 files changed, 99 insertions(+)

Comments

Oleksandr Tyshchenko Aug. 16, 2020, 7:26 p.m. UTC | #1
On 15.08.20 20:21, Julien Grall wrote:

Hi Julien

> From: Julien Grall <jgrall@amazon.com>
>
> The IOREQ code is using cmpxchg() with 64-bit value. At the moment, this
> is x86 code, but there is plan to make it common.
>
> To cater 32-bit arch, introduce two new helpers to deal with 64-bit
> cmpxchg.
>
> The Arm 32-bit implementation of cmpxchg64() is based on the __cmpxchg64
> in Linux v5.8 (arch/arm/include/asm/cmpxchg.h).
>
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
>   xen/include/asm-arm/arm32/cmpxchg.h | 68 +++++++++++++++++++++++++++++
>   xen/include/asm-arm/arm64/cmpxchg.h |  5 +++
>   xen/include/asm-arm/guest_atomics.h | 22 ++++++++++
>   xen/include/asm-x86/guest_atomics.h |  2 +
>   xen/include/asm-x86/x86_64/system.h |  2 +
>   5 files changed, 99 insertions(+)

Thank you for doing this. I have successfully build-tested your patch 
with IOREQ series on Arm32/Arm64 (of course before that I had changed 
cmpxchg to guest_cmpxchg64 in hvm_send_buffered_ioreq()).
Roger Pau Monné Aug. 17, 2020, 9:24 a.m. UTC | #2
On Sat, Aug 15, 2020 at 06:21:43PM +0100, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> The IOREQ code is using cmpxchg() with 64-bit value. At the moment, this
> is x86 code, but there is plan to make it common.
> 
> To cater 32-bit arch, introduce two new helpers to deal with 64-bit
> cmpxchg.
> 
> The Arm 32-bit implementation of cmpxchg64() is based on the __cmpxchg64
> in Linux v5.8 (arch/arm/include/asm/cmpxchg.h).
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
> diff --git a/xen/include/asm-x86/guest_atomics.h b/xen/include/asm-x86/guest_atomics.h
> index 029417c8ffc1..f4de9d3631ff 100644
> --- a/xen/include/asm-x86/guest_atomics.h
> +++ b/xen/include/asm-x86/guest_atomics.h
> @@ -20,6 +20,8 @@
>      ((void)(d), test_and_change_bit(nr, p))
>  
>  #define guest_cmpxchg(d, ptr, o, n) ((void)(d), cmpxchg(ptr, o, n))
> +#define guest_cmpxchg64(d, ptr, o, n) ((void)(d), cmpxchg64(ptr, o, n))
> +
>  
>  #endif /* _X86_GUEST_ATOMICS_H */
>  /*
> diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h
> index f471859c19cc..c1b16105e9f2 100644
> --- a/xen/include/asm-x86/x86_64/system.h
> +++ b/xen/include/asm-x86/x86_64/system.h
> @@ -5,6 +5,8 @@
>      ((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),            \
>                                     (unsigned long)(n),sizeof(*(ptr))))
>  
> +#define cmpxchg64(ptr, o, n) cmpxchg(ptr, o, n)

Why do you need to introduce an explicitly sized version of cmpxchg
for 64bit values?

There's no cmpxchg{8,16,32}, so I would expect cmpxchg64 to just be
handled by cmpxchg detecting the size of the parameter passed to the
function. I think it's worth adding to the commit message why such
differentiated helper is needed.

Thanks, Roger.
Julien Grall Aug. 17, 2020, 9:42 a.m. UTC | #3
Hi,

On 17/08/2020 10:24, Roger Pau Monné wrote:
> On Sat, Aug 15, 2020 at 06:21:43PM +0100, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> The IOREQ code is using cmpxchg() with 64-bit value. At the moment, this
>> is x86 code, but there is plan to make it common.
>>
>> To cater 32-bit arch, introduce two new helpers to deal with 64-bit
>> cmpxchg.
>>
>> The Arm 32-bit implementation of cmpxchg64() is based on the __cmpxchg64
>> in Linux v5.8 (arch/arm/include/asm/cmpxchg.h).
>>
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> diff --git a/xen/include/asm-x86/guest_atomics.h b/xen/include/asm-x86/guest_atomics.h
>> index 029417c8ffc1..f4de9d3631ff 100644
>> --- a/xen/include/asm-x86/guest_atomics.h
>> +++ b/xen/include/asm-x86/guest_atomics.h
>> @@ -20,6 +20,8 @@
>>       ((void)(d), test_and_change_bit(nr, p))
>>   
>>   #define guest_cmpxchg(d, ptr, o, n) ((void)(d), cmpxchg(ptr, o, n))
>> +#define guest_cmpxchg64(d, ptr, o, n) ((void)(d), cmpxchg64(ptr, o, n))
>> +
>>   
>>   #endif /* _X86_GUEST_ATOMICS_H */
>>   /*
>> diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h
>> index f471859c19cc..c1b16105e9f2 100644
>> --- a/xen/include/asm-x86/x86_64/system.h
>> +++ b/xen/include/asm-x86/x86_64/system.h
>> @@ -5,6 +5,8 @@
>>       ((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),            \
>>                                      (unsigned long)(n),sizeof(*(ptr))))
>>   
>> +#define cmpxchg64(ptr, o, n) cmpxchg(ptr, o, n)
> 
> Why do you need to introduce an explicitly sized version of cmpxchg
> for 64bit values?
> 
> There's no cmpxchg{8,16,32}, so I would expect cmpxchg64 to just be
> handled by cmpxchg detecting the size of the parameter passed to the
> function.
That works quite well for 64-bit arches. However, for 32-bit, you would 
need to take some detour so 32-bit and 64-bit can cohabit (you cannot 
simply replace unsigned long with uint64_t).

I couldn't come up with a way to do it. Do you have any suggestion?

Cheers,
Roger Pau Monné Aug. 17, 2020, 10:33 a.m. UTC | #4
On Mon, Aug 17, 2020 at 10:42:54AM +0100, Julien Grall wrote:
> Hi,
> 
> On 17/08/2020 10:24, Roger Pau Monné wrote:
> > On Sat, Aug 15, 2020 at 06:21:43PM +0100, Julien Grall wrote:
> > > From: Julien Grall <jgrall@amazon.com>
> > > 
> > > The IOREQ code is using cmpxchg() with 64-bit value. At the moment, this
> > > is x86 code, but there is plan to make it common.
> > > 
> > > To cater 32-bit arch, introduce two new helpers to deal with 64-bit
> > > cmpxchg.
> > > 
> > > The Arm 32-bit implementation of cmpxchg64() is based on the __cmpxchg64
> > > in Linux v5.8 (arch/arm/include/asm/cmpxchg.h).
> > > 
> > > Signed-off-by: Julien Grall <jgrall@amazon.com>
> > > Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > ---
> > > diff --git a/xen/include/asm-x86/guest_atomics.h b/xen/include/asm-x86/guest_atomics.h
> > > index 029417c8ffc1..f4de9d3631ff 100644
> > > --- a/xen/include/asm-x86/guest_atomics.h
> > > +++ b/xen/include/asm-x86/guest_atomics.h
> > > @@ -20,6 +20,8 @@
> > >       ((void)(d), test_and_change_bit(nr, p))
> > >   #define guest_cmpxchg(d, ptr, o, n) ((void)(d), cmpxchg(ptr, o, n))
> > > +#define guest_cmpxchg64(d, ptr, o, n) ((void)(d), cmpxchg64(ptr, o, n))
> > > +
> > >   #endif /* _X86_GUEST_ATOMICS_H */
> > >   /*
> > > diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h
> > > index f471859c19cc..c1b16105e9f2 100644
> > > --- a/xen/include/asm-x86/x86_64/system.h
> > > +++ b/xen/include/asm-x86/x86_64/system.h
> > > @@ -5,6 +5,8 @@
> > >       ((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),            \
> > >                                      (unsigned long)(n),sizeof(*(ptr))))
> > > +#define cmpxchg64(ptr, o, n) cmpxchg(ptr, o, n)
> > 
> > Why do you need to introduce an explicitly sized version of cmpxchg
> > for 64bit values?
> > 
> > There's no cmpxchg{8,16,32}, so I would expect cmpxchg64 to just be
> > handled by cmpxchg detecting the size of the parameter passed to the
> > function.
> That works quite well for 64-bit arches. However, for 32-bit, you would need
> to take some detour so 32-bit and 64-bit can cohabit (you cannot simply
> replace unsigned long with uint64_t).

Oh, I see. Switching __cmpxchg on Arm 32 to use unsigned long long or
uint64_t would be bad, as you would then need two registers to pass
the value to the function, or push it on the stack?

Maybe do something like:

#define cmpxchg(ptr,o,n) ({						\
	typeof(*(ptr)) tmp;						\
									\
	switch ( sizeof(*(ptr)) )					\
	{								\
	case 8:								\
		tmp = __cmpxchg_mb64((ptr), (uint64_t)(o),		\
				(uint64_t)(n), sizeof(*(ptr))))		\
		break;							\
	default:							\
		tmp = __cmpxchg_mb((ptr), (unsigned long)(o),		\
				(unsigned long)(n), sizeof(*(ptr))))	\
		break;							\
	}								\
	tmp;								\
})

Roger.
Julien Grall Aug. 17, 2020, 11:05 a.m. UTC | #5
Hi,

On 17/08/2020 11:33, Roger Pau Monné wrote:
> On Mon, Aug 17, 2020 at 10:42:54AM +0100, Julien Grall wrote:
>> Hi,
>>
>> On 17/08/2020 10:24, Roger Pau Monné wrote:
>>> On Sat, Aug 15, 2020 at 06:21:43PM +0100, Julien Grall wrote:
>>>> From: Julien Grall <jgrall@amazon.com>
>>>>
>>>> The IOREQ code is using cmpxchg() with 64-bit value. At the moment, this
>>>> is x86 code, but there is plan to make it common.
>>>>
>>>> To cater 32-bit arch, introduce two new helpers to deal with 64-bit
>>>> cmpxchg.
>>>>
>>>> The Arm 32-bit implementation of cmpxchg64() is based on the __cmpxchg64
>>>> in Linux v5.8 (arch/arm/include/asm/cmpxchg.h).
>>>>
>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> ---
>>>> diff --git a/xen/include/asm-x86/guest_atomics.h b/xen/include/asm-x86/guest_atomics.h
>>>> index 029417c8ffc1..f4de9d3631ff 100644
>>>> --- a/xen/include/asm-x86/guest_atomics.h
>>>> +++ b/xen/include/asm-x86/guest_atomics.h
>>>> @@ -20,6 +20,8 @@
>>>>        ((void)(d), test_and_change_bit(nr, p))
>>>>    #define guest_cmpxchg(d, ptr, o, n) ((void)(d), cmpxchg(ptr, o, n))
>>>> +#define guest_cmpxchg64(d, ptr, o, n) ((void)(d), cmpxchg64(ptr, o, n))
>>>> +
>>>>    #endif /* _X86_GUEST_ATOMICS_H */
>>>>    /*
>>>> diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h
>>>> index f471859c19cc..c1b16105e9f2 100644
>>>> --- a/xen/include/asm-x86/x86_64/system.h
>>>> +++ b/xen/include/asm-x86/x86_64/system.h
>>>> @@ -5,6 +5,8 @@
>>>>        ((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),            \
>>>>                                       (unsigned long)(n),sizeof(*(ptr))))
>>>> +#define cmpxchg64(ptr, o, n) cmpxchg(ptr, o, n)
>>>
>>> Why do you need to introduce an explicitly sized version of cmpxchg
>>> for 64bit values?
>>>
>>> There's no cmpxchg{8,16,32}, so I would expect cmpxchg64 to just be
>>> handled by cmpxchg detecting the size of the parameter passed to the
>>> function.
>> That works quite well for 64-bit arches. However, for 32-bit, you would need
>> to take some detour so 32-bit and 64-bit can cohabit (you cannot simply
>> replace unsigned long with uint64_t).
> 
> Oh, I see. Switching __cmpxchg on Arm 32 to use unsigned long long or
> uint64_t would be bad, as you would then need two registers to pass
> the value to the function, or push it on the stack?

We have only 4 registers (r0 - r4) available for the arguments. With 
64-bit value, we will be using 2 registers, some will end up to be 
pushed on the stack.

This is assuming the compiler is not clever enough to see we are only 
using the bottom 32-bit with some cmpxchg.

> 
> Maybe do something like:
> 
> #define cmpxchg(ptr,o,n) ({						\
> 	typeof(*(ptr)) tmp;						\
> 									\
> 	switch ( sizeof(*(ptr)) )					\
> 	{								\
> 	case 8:								\
> 		tmp = __cmpxchg_mb64((ptr), (uint64_t)(o),		\
> 				(uint64_t)(n), sizeof(*(ptr))))		\
> 		break;							\
> 	default:							\
> 		tmp = __cmpxchg_mb((ptr), (unsigned long)(o),		\
> 				(unsigned long)(n), sizeof(*(ptr))))	\
> 		break;							\
> 	}								\
> 	tmp;								\
> })


Unfortunately this can't compile if o and n are pointers because the 
compiler will complain about the cast to uint64_t.

We would also need a cast when assigning to tmp because tmp may not be a 
scalar type. This would lead to the same compiler issue.

The only way I could see to make it work would be to use the same trick 
as we do for {read, write}_atomic() (see asm-arm/atomic.h). We are using 
union and void pointer to prevent explicit cast.

But I am not sure whether the effort is really worth it.

Cheers,
Roger Pau Monné Aug. 17, 2020, 11:50 a.m. UTC | #6
On Mon, Aug 17, 2020 at 12:05:54PM +0100, Julien Grall wrote:
> Hi,
> 
> On 17/08/2020 11:33, Roger Pau Monné wrote:
> > On Mon, Aug 17, 2020 at 10:42:54AM +0100, Julien Grall wrote:
> > > Hi,
> > > 
> > > On 17/08/2020 10:24, Roger Pau Monné wrote:
> > > > On Sat, Aug 15, 2020 at 06:21:43PM +0100, Julien Grall wrote:
> > > > > From: Julien Grall <jgrall@amazon.com>
> > > > > 
> > > > > The IOREQ code is using cmpxchg() with 64-bit value. At the moment, this
> > > > > is x86 code, but there is plan to make it common.
> > > > > 
> > > > > To cater 32-bit arch, introduce two new helpers to deal with 64-bit
> > > > > cmpxchg.
> > > > > 
> > > > > The Arm 32-bit implementation of cmpxchg64() is based on the __cmpxchg64
> > > > > in Linux v5.8 (arch/arm/include/asm/cmpxchg.h).
> > > > > 
> > > > > Signed-off-by: Julien Grall <jgrall@amazon.com>
> > > > > Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > > ---
> > > > > diff --git a/xen/include/asm-x86/guest_atomics.h b/xen/include/asm-x86/guest_atomics.h
> > > > > index 029417c8ffc1..f4de9d3631ff 100644
> > > > > --- a/xen/include/asm-x86/guest_atomics.h
> > > > > +++ b/xen/include/asm-x86/guest_atomics.h
> > > > > @@ -20,6 +20,8 @@
> > > > >        ((void)(d), test_and_change_bit(nr, p))
> > > > >    #define guest_cmpxchg(d, ptr, o, n) ((void)(d), cmpxchg(ptr, o, n))
> > > > > +#define guest_cmpxchg64(d, ptr, o, n) ((void)(d), cmpxchg64(ptr, o, n))
> > > > > +
> > > > >    #endif /* _X86_GUEST_ATOMICS_H */
> > > > >    /*
> > > > > diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h
> > > > > index f471859c19cc..c1b16105e9f2 100644
> > > > > --- a/xen/include/asm-x86/x86_64/system.h
> > > > > +++ b/xen/include/asm-x86/x86_64/system.h
> > > > > @@ -5,6 +5,8 @@
> > > > >        ((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),            \
> > > > >                                       (unsigned long)(n),sizeof(*(ptr))))
> > > > > +#define cmpxchg64(ptr, o, n) cmpxchg(ptr, o, n)
> > > > 
> > > > Why do you need to introduce an explicitly sized version of cmpxchg
> > > > for 64bit values?
> > > > 
> > > > There's no cmpxchg{8,16,32}, so I would expect cmpxchg64 to just be
> > > > handled by cmpxchg detecting the size of the parameter passed to the
> > > > function.
> > > That works quite well for 64-bit arches. However, for 32-bit, you would need
> > > to take some detour so 32-bit and 64-bit can cohabit (you cannot simply
> > > replace unsigned long with uint64_t).
> > 
> > Oh, I see. Switching __cmpxchg on Arm 32 to use unsigned long long or
> > uint64_t would be bad, as you would then need two registers to pass
> > the value to the function, or push it on the stack?
> 
> We have only 4 registers (r0 - r4) available for the arguments. With 64-bit
> value, we will be using 2 registers, some will end up to be pushed on the
> stack.
> 
> This is assuming the compiler is not clever enough to see we are only using
> the bottom 32-bit with some cmpxchg.
> 
> > 
> > Maybe do something like:
> > 
> > #define cmpxchg(ptr,o,n) ({						\
> > 	typeof(*(ptr)) tmp;						\
> > 									\
> > 	switch ( sizeof(*(ptr)) )					\
> > 	{								\
> > 	case 8:								\
> > 		tmp = __cmpxchg_mb64((ptr), (uint64_t)(o),		\
> > 				(uint64_t)(n), sizeof(*(ptr))))		\
> > 		break;							\
> > 	default:							\
> > 		tmp = __cmpxchg_mb((ptr), (unsigned long)(o),		\
> > 				(unsigned long)(n), sizeof(*(ptr))))	\
> > 		break;							\
> > 	}								\
> > 	tmp;								\
> > })
> 
> 
> Unfortunately this can't compile if o and n are pointers because the
> compiler will complain about the cast to uint64_t.

Right, we would have to cast to unsigned long first and then to
uint64_t, which is not very nice.

> 
> We would also need a cast when assigning to tmp because tmp may not be a
> scalar type. This would lead to the same compiler issue.

Yes, we would have to do a bunch of casts.

> The only way I could see to make it work would be to use the same trick as
> we do for {read, write}_atomic() (see asm-arm/atomic.h). We are using union
> and void pointer to prevent explicit cast.

I'm mostly worried about common code having assumed that cmpxchg
does also handle 64bit sized parameters, and thus failing to use
cmpxchg64 when required. I assume this is not much of a deal as then
the Arm 32 build would fail, so it should be fairly easy to catch
those.

I don't think the union is so bad, but let's wait to see what others
think.

FWIW x86 already has a specific handler for 128bit values: cmpxchg16b.
Maybe it would be better to name this cmpxchg8b? Or rename the
existing one to cmpxchg128 for coherence.

Thanks, Roger.
Julien Grall Aug. 17, 2020, 1:03 p.m. UTC | #7
On 17/08/2020 12:50, Roger Pau Monné wrote:
> On Mon, Aug 17, 2020 at 12:05:54PM +0100, Julien Grall wrote:
>> Hi,
>>
>> On 17/08/2020 11:33, Roger Pau Monné wrote:
>>> On Mon, Aug 17, 2020 at 10:42:54AM +0100, Julien Grall wrote:
>>>> Hi,
>>>>
>>>> On 17/08/2020 10:24, Roger Pau Monné wrote:
>>>>> On Sat, Aug 15, 2020 at 06:21:43PM +0100, Julien Grall wrote:
>>>>>> From: Julien Grall <jgrall@amazon.com>
>>>>>>
>>>>>> The IOREQ code is using cmpxchg() with 64-bit value. At the moment, this
>>>>>> is x86 code, but there is plan to make it common.
>>>>>>
>>>>>> To cater 32-bit arch, introduce two new helpers to deal with 64-bit
>>>>>> cmpxchg.
>>>>>>
>>>>>> The Arm 32-bit implementation of cmpxchg64() is based on the __cmpxchg64
>>>>>> in Linux v5.8 (arch/arm/include/asm/cmpxchg.h).
>>>>>>
>>>>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>>>>> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>>> ---
>>>>>> diff --git a/xen/include/asm-x86/guest_atomics.h b/xen/include/asm-x86/guest_atomics.h
>>>>>> index 029417c8ffc1..f4de9d3631ff 100644
>>>>>> --- a/xen/include/asm-x86/guest_atomics.h
>>>>>> +++ b/xen/include/asm-x86/guest_atomics.h
>>>>>> @@ -20,6 +20,8 @@
>>>>>>         ((void)(d), test_and_change_bit(nr, p))
>>>>>>     #define guest_cmpxchg(d, ptr, o, n) ((void)(d), cmpxchg(ptr, o, n))
>>>>>> +#define guest_cmpxchg64(d, ptr, o, n) ((void)(d), cmpxchg64(ptr, o, n))
>>>>>> +
>>>>>>     #endif /* _X86_GUEST_ATOMICS_H */
>>>>>>     /*
>>>>>> diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h
>>>>>> index f471859c19cc..c1b16105e9f2 100644
>>>>>> --- a/xen/include/asm-x86/x86_64/system.h
>>>>>> +++ b/xen/include/asm-x86/x86_64/system.h
>>>>>> @@ -5,6 +5,8 @@
>>>>>>         ((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),            \
>>>>>>                                        (unsigned long)(n),sizeof(*(ptr))))
>>>>>> +#define cmpxchg64(ptr, o, n) cmpxchg(ptr, o, n)
>>>>>
>>>>> Why do you need to introduce an explicitly sized version of cmpxchg
>>>>> for 64bit values?
>>>>>
>>>>> There's no cmpxchg{8,16,32}, so I would expect cmpxchg64 to just be
>>>>> handled by cmpxchg detecting the size of the parameter passed to the
>>>>> function.
>>>> That works quite well for 64-bit arches. However, for 32-bit, you would need
>>>> to take some detour so 32-bit and 64-bit can cohabit (you cannot simply
>>>> replace unsigned long with uint64_t).
>>>
>>> Oh, I see. Switching __cmpxchg on Arm 32 to use unsigned long long or
>>> uint64_t would be bad, as you would then need two registers to pass
>>> the value to the function, or push it on the stack?
>>
>> We have only 4 registers (r0 - r4) available for the arguments. With 64-bit
>> value, we will be using 2 registers, some will end up to be pushed on the
>> stack.
>>
>> This is assuming the compiler is not clever enough to see we are only using
>> the bottom 32-bit with some cmpxchg.
>>
>>>
>>> Maybe do something like:
>>>
>>> #define cmpxchg(ptr,o,n) ({						\
>>> 	typeof(*(ptr)) tmp;						\
>>> 									\
>>> 	switch ( sizeof(*(ptr)) )					\
>>> 	{								\
>>> 	case 8:								\
>>> 		tmp = __cmpxchg_mb64((ptr), (uint64_t)(o),		\
>>> 				(uint64_t)(n), sizeof(*(ptr))))		\
>>> 		break;							\
>>> 	default:							\
>>> 		tmp = __cmpxchg_mb((ptr), (unsigned long)(o),		\
>>> 				(unsigned long)(n), sizeof(*(ptr))))	\
>>> 		break;							\
>>> 	}								\
>>> 	tmp;								\
>>> })
>>
>>
>> Unfortunately this can't compile if o and n are pointers because the
>> compiler will complain about the cast to uint64_t.
> 
> Right, we would have to cast to unsigned long first and then to
> uint64_t, which is not very nice.

If you use (uint64_t)(unsigned long) in the 64-bit case, then you would 
lose the top 32-bit. So cmpxchg() wouldn't work as expected.

> 
>>
>> We would also need a cast when assigning to tmp because tmp may not be a
>> scalar type. This would lead to the same compiler issue.
> 
> Yes, we would have to do a bunch of casts.

I don't think there is a way to solve this using just cast.

> 
>> The only way I could see to make it work would be to use the same trick as
>> we do for {read, write}_atomic() (see asm-arm/atomic.h). We are using union
>> and void pointer to prevent explicit cast.
> 
> I'm mostly worried about common code having assumed that cmpxchg
> does also handle 64bit sized parameters, and thus failing to use
> cmpxchg64 when required. I assume this is not much of a deal as then
> the Arm 32 build would fail, so it should be fairly easy to catch
> those.
FWIW, this is not very different to the existing approach. If one would 
use cmpxchg() with 64-bit, then it would fail to compile.

Furthermore, there is no guarantee that a new 32-bit arch would have 
64-bit atomic operations. For instance, not all 32-bit Arm processors 
have 64-bit atomic operations. Although, the one supporting 
virtualization will have them.

So I think we will always to rely on review and build testing to catch 
error.

> 
> I don't think the union is so bad, but let's wait to see what others
> think.

I am not concerned about the code itself but the assembly generated. I 
don't want to increase the number memory access or instructions just for 
the sake of trying to get cmpxchg() to work with 64-bit.

I will have to implement it and see the code generated.

> 
> FWIW x86 already has a specific handler for 128bit values: cmpxchg16b.
> Maybe it would be better to name this cmpxchg8b? Or rename the
> existing one to cmpxchg128 for coherence.

I dislike the name cmpxchg8b(). This is much easier to match the type 
and the name with cmpxchg64().

I would be happy to rename cmpxchg16b() if the x86 folks would want it.

Cheers,
Roger Pau Monné Aug. 17, 2020, 2:20 p.m. UTC | #8
On Mon, Aug 17, 2020 at 02:03:23PM +0100, Julien Grall wrote:
> 
> 
> On 17/08/2020 12:50, Roger Pau Monné wrote:
> > On Mon, Aug 17, 2020 at 12:05:54PM +0100, Julien Grall wrote:
> > > Hi,
> > > 
> > > On 17/08/2020 11:33, Roger Pau Monné wrote:
> > > > On Mon, Aug 17, 2020 at 10:42:54AM +0100, Julien Grall wrote:
> > > > > Hi,
> > > > > 
> > > > > On 17/08/2020 10:24, Roger Pau Monné wrote:
> > > > > > On Sat, Aug 15, 2020 at 06:21:43PM +0100, Julien Grall wrote:
> > > > > > > From: Julien Grall <jgrall@amazon.com>
> > > > > > > 
> > > > > > > The IOREQ code is using cmpxchg() with 64-bit value. At the moment, this
> > > > > > > is x86 code, but there is plan to make it common.
> > > > > > > 
> > > > > > > To cater 32-bit arch, introduce two new helpers to deal with 64-bit
> > > > > > > cmpxchg.
> > > > > > > 
> > > > > > > The Arm 32-bit implementation of cmpxchg64() is based on the __cmpxchg64
> > > > > > > in Linux v5.8 (arch/arm/include/asm/cmpxchg.h).
> > > > > > > 
> > > > > > > Signed-off-by: Julien Grall <jgrall@amazon.com>
> > > > > > > Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> > > > > > > ---
> > > > > > > diff --git a/xen/include/asm-x86/guest_atomics.h b/xen/include/asm-x86/guest_atomics.h
> > > > > > > index 029417c8ffc1..f4de9d3631ff 100644
> > > > > > > --- a/xen/include/asm-x86/guest_atomics.h
> > > > > > > +++ b/xen/include/asm-x86/guest_atomics.h
> > > > > > > @@ -20,6 +20,8 @@
> > > > > > >         ((void)(d), test_and_change_bit(nr, p))
> > > > > > >     #define guest_cmpxchg(d, ptr, o, n) ((void)(d), cmpxchg(ptr, o, n))
> > > > > > > +#define guest_cmpxchg64(d, ptr, o, n) ((void)(d), cmpxchg64(ptr, o, n))
> > > > > > > +
> > > > > > >     #endif /* _X86_GUEST_ATOMICS_H */
> > > > > > >     /*
> > > > > > > diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h
> > > > > > > index f471859c19cc..c1b16105e9f2 100644
> > > > > > > --- a/xen/include/asm-x86/x86_64/system.h
> > > > > > > +++ b/xen/include/asm-x86/x86_64/system.h
> > > > > > > @@ -5,6 +5,8 @@
> > > > > > >         ((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),            \
> > > > > > >                                        (unsigned long)(n),sizeof(*(ptr))))
> > > > > > > +#define cmpxchg64(ptr, o, n) cmpxchg(ptr, o, n)
> > > > > > 
> > > > > > Why do you need to introduce an explicitly sized version of cmpxchg
> > > > > > for 64bit values?
> > > > > > 
> > > > > > There's no cmpxchg{8,16,32}, so I would expect cmpxchg64 to just be
> > > > > > handled by cmpxchg detecting the size of the parameter passed to the
> > > > > > function.
> > > > > That works quite well for 64-bit arches. However, for 32-bit, you would need
> > > > > to take some detour so 32-bit and 64-bit can cohabit (you cannot simply
> > > > > replace unsigned long with uint64_t).
> > > > 
> > > > Oh, I see. Switching __cmpxchg on Arm 32 to use unsigned long long or
> > > > uint64_t would be bad, as you would then need two registers to pass
> > > > the value to the function, or push it on the stack?
> > > 
> > > We have only 4 registers (r0 - r4) available for the arguments. With 64-bit
> > > value, we will be using 2 registers, some will end up to be pushed on the
> > > stack.
> > > 
> > > This is assuming the compiler is not clever enough to see we are only using
> > > the bottom 32-bit with some cmpxchg.
> > > 
> > > > 
> > > > Maybe do something like:
> > > > 
> > > > #define cmpxchg(ptr,o,n) ({						\
> > > > 	typeof(*(ptr)) tmp;						\
> > > > 									\
> > > > 	switch ( sizeof(*(ptr)) )					\
> > > > 	{								\
> > > > 	case 8:								\
> > > > 		tmp = __cmpxchg_mb64((ptr), (uint64_t)(o),		\
> > > > 				(uint64_t)(n), sizeof(*(ptr))))		\
> > > > 		break;							\
> > > > 	default:							\
> > > > 		tmp = __cmpxchg_mb((ptr), (unsigned long)(o),		\
> > > > 				(unsigned long)(n), sizeof(*(ptr))))	\
> > > > 		break;							\
> > > > 	}								\
> > > > 	tmp;								\
> > > > })
> > > 
> > > 
> > > Unfortunately this can't compile if o and n are pointers because the
> > > compiler will complain about the cast to uint64_t.
> > 
> > Right, we would have to cast to unsigned long first and then to
> > uint64_t, which is not very nice.
> 
> If you use (uint64_t)(unsigned long) in the 64-bit case, then you would lose
> the top 32-bit. So cmpxchg() wouldn't work as expected.
> 
> > 
> > > 
> > > We would also need a cast when assigning to tmp because tmp may not be a
> > > scalar type. This would lead to the same compiler issue.
> > 
> > Yes, we would have to do a bunch of casts.
> 
> I don't think there is a way to solve this using just cast.

Right. I certainly failed to see it.

> > 
> > I don't think the union is so bad, but let's wait to see what others
> > think.
> 
> I am not concerned about the code itself but the assembly generated. I don't
> want to increase the number memory access or instructions just for the sake
> of trying to get cmpxchg() to work with 64-bit.
> 
> I will have to implement it and see the code generated.

Since we already have a 128bit version I don't think there's a problem
in introducing a 64bit version (and forcing it's usage in common
code).

> > 
> > FWIW x86 already has a specific handler for 128bit values: cmpxchg16b.
> > Maybe it would be better to name this cmpxchg8b? Or rename the
> > existing one to cmpxchg128 for coherence.
> 
> I dislike the name cmpxchg8b(). This is much easier to match the type and
> the name with cmpxchg64().
> 
> I would be happy to rename cmpxchg16b() if the x86 folks would want it.

That would be fine by me.
Stefano Stabellini Aug. 17, 2020, 10:56 p.m. UTC | #9
On Sat, 15 Aug 2020, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> The IOREQ code is using cmpxchg() with 64-bit value. At the moment, this
> is x86 code, but there is plan to make it common.
> 
> To cater 32-bit arch, introduce two new helpers to deal with 64-bit
> cmpxchg.
> 
> The Arm 32-bit implementation of cmpxchg64() is based on the __cmpxchg64
> in Linux v5.8 (arch/arm/include/asm/cmpxchg.h).
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
>  xen/include/asm-arm/arm32/cmpxchg.h | 68 +++++++++++++++++++++++++++++
>  xen/include/asm-arm/arm64/cmpxchg.h |  5 +++
>  xen/include/asm-arm/guest_atomics.h | 22 ++++++++++
>  xen/include/asm-x86/guest_atomics.h |  2 +
>  xen/include/asm-x86/x86_64/system.h |  2 +
>  5 files changed, 99 insertions(+)
> 
> diff --git a/xen/include/asm-arm/arm32/cmpxchg.h b/xen/include/asm-arm/arm32/cmpxchg.h
> index 0770f272ee99..5e2fa6ee38a0 100644
> --- a/xen/include/asm-arm/arm32/cmpxchg.h
> +++ b/xen/include/asm-arm/arm32/cmpxchg.h
> @@ -87,6 +87,38 @@ __CMPXCHG_CASE(b, 1)
>  __CMPXCHG_CASE(h, 2)
>  __CMPXCHG_CASE( , 4)
>  
> +static inline bool __cmpxchg_case_8(volatile uint64_t *ptr,
> +			 	    uint64_t *old,
> +			 	    uint64_t new,
> +			 	    bool timeout,
> +				    unsigned int max_try)
> +{
> +	uint64_t oldval;
> +	uint64_t res;
> +
> +	do {
> +		asm volatile(
> +		"	ldrexd		%1, %H1, [%3]\n"
> +		"	teq		%1, %4\n"
> +		"	teqeq		%H1, %H4\n"
> +		"	movne		%0, #0\n"
> +		"	movne		%H0, #0\n"
> +		"	bne		2f\n"
> +		"	strexd		%0, %5, %H5, [%3]\n"
> +		"	teq		%0, #0\n"

Apologies if I am misreading this code, but this last "teq" instruction
doesn't seem to be useful?


> +		"2:"
> +		: "=&r" (res), "=&r" (oldval), "+Qo" (*ptr)
                                              ^ not used ?


> +		: "r" (ptr), "r" (*old), "r" (new)
> +		: "memory", "cc");
> +		if (!res)
> +			break;
> +	} while (!timeout || ((--max_try) > 0));
> +
> +	*old = oldval;
> +
> +	return !res;
> +}
> +
>  static always_inline bool __int_cmpxchg(volatile void *ptr, unsigned long *old,
>  					unsigned long new, int size,
>  					bool timeout, unsigned int max_try)
> @@ -156,6 +188,30 @@ static always_inline bool __cmpxchg_mb_timeout(volatile void *ptr,
>  	return ret;
>  }
>  
> +/*
> + * The helper may fail to update the memory if the action takes too long.
> + *
> + * @old: On call the value pointed contains the expected old value. It will be
> + * updated to the actual old value.
> + * @max_try: Maximum number of iterations
> + *
> + * The helper will return true when the update has succeeded (i.e no
> + * timeout) and false if the update has failed.
> + */
> +static always_inline bool __cmpxchg64_mb_timeout(volatile uint64_t *ptr,
> +						 uint64_t *old,
> +						 uint64_t new,
> +						 unsigned int max_try)
> +{
> +	bool ret;
> +
> +	smp_mb();
> +	ret = __cmpxchg_case_8(ptr, old, new, true, max_try);
> +	smp_mb();
> +
> +	return ret;
> +}
> +
>  #define cmpxchg(ptr,o,n)						\
>  	((__typeof__(*(ptr)))__cmpxchg_mb((ptr),			\
>  					  (unsigned long)(o),		\
> @@ -167,6 +223,18 @@ static always_inline bool __cmpxchg_mb_timeout(volatile void *ptr,
>  				       (unsigned long)(o),		\
>  				       (unsigned long)(n),		\
>  				       sizeof(*(ptr))))
> +
> +static inline uint64_t cmpxchg64(volatile uint64_t *ptr,
> +				 uint64_t old,
> +				 uint64_t new)
> +{
> +	smp_mb();

I was looking at the existing code I noticed that we don't have a
corresponding smp_mb(); in this position. Is it needed here because of
the 64bit-ness?


> +	if (!__cmpxchg_case_8(ptr, &old, new, false, 0))
> +		ASSERT_UNREACHABLE();
> +
> +	return old;
> +}
> +
>  #endif
>  /*
>   * Local variables:
> diff --git a/xen/include/asm-arm/arm64/cmpxchg.h b/xen/include/asm-arm/arm64/cmpxchg.h
> index fc5c60f0bd74..de9cd0ee2b07 100644
> --- a/xen/include/asm-arm/arm64/cmpxchg.h
> +++ b/xen/include/asm-arm/arm64/cmpxchg.h
> @@ -187,6 +187,11 @@ static always_inline bool __cmpxchg_mb_timeout(volatile void *ptr,
>  	__ret; \
>  })
>  
> +#define cmpxchg64(ptr, o, n) cmpxchg(ptr, o, n)
> +
> +#define __cmpxchg64_mb_timeout(ptr, old, new, max_try) \
> +	__cmpxchg_mb_timeout(ptr, old, new, 8, max_try)
> +
>  #endif
>  /*
>   * Local variables:
> diff --git a/xen/include/asm-arm/guest_atomics.h b/xen/include/asm-arm/guest_atomics.h
> index af27cc627bf3..28ce402bea79 100644
> --- a/xen/include/asm-arm/guest_atomics.h
> +++ b/xen/include/asm-arm/guest_atomics.h
> @@ -115,6 +115,28 @@ static inline unsigned long __guest_cmpxchg(struct domain *d,
>                                           (unsigned long)(n),\
>                                           sizeof (*(ptr))))
>  
> +static inline uint64_t guest_cmpxchg64(struct domain *d,
> +                                       volatile uint64_t *ptr,
> +                                       uint64_t old,
> +                                       uint64_t new)
> +{
> +    uint64_t oldval = old;
> +
> +    perfc_incr(atomics_guest);
> +
> +    if ( __cmpxchg64_mb_timeout(ptr, &oldval, new,
> +                                this_cpu(guest_safe_atomic_max)) )
> +        return oldval;
> +
> +    perfc_incr(atomics_guest_paused);
> +
> +    domain_pause_nosync(d);
> +    oldval = cmpxchg64(ptr, old, new);
> +    domain_unpause(d);
> +
> +    return oldval;
> +}
> +
>  #endif /* _ARM_GUEST_ATOMICS_H */
>  /*
>   * Local variables:
> diff --git a/xen/include/asm-x86/guest_atomics.h b/xen/include/asm-x86/guest_atomics.h
> index 029417c8ffc1..f4de9d3631ff 100644
> --- a/xen/include/asm-x86/guest_atomics.h
> +++ b/xen/include/asm-x86/guest_atomics.h
> @@ -20,6 +20,8 @@
>      ((void)(d), test_and_change_bit(nr, p))
>  
>  #define guest_cmpxchg(d, ptr, o, n) ((void)(d), cmpxchg(ptr, o, n))
> +#define guest_cmpxchg64(d, ptr, o, n) ((void)(d), cmpxchg64(ptr, o, n))
> +
>  
>  #endif /* _X86_GUEST_ATOMICS_H */
>  /*
> diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h
> index f471859c19cc..c1b16105e9f2 100644
> --- a/xen/include/asm-x86/x86_64/system.h
> +++ b/xen/include/asm-x86/x86_64/system.h
> @@ -5,6 +5,8 @@
>      ((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),            \
>                                     (unsigned long)(n),sizeof(*(ptr))))
>  
> +#define cmpxchg64(ptr, o, n) cmpxchg(ptr, o, n)
> +
>  /*
>   * Atomic 16 bytes compare and exchange.  Compare OLD with MEM, if
>   * identical, store NEW in MEM.  Return the initial value in MEM.
> -- 
> 2.17.1
>
Julien Grall Aug. 18, 2020, 10:30 a.m. UTC | #10
Hi Stefano,

On 17/08/2020 23:56, Stefano Stabellini wrote:
> On Sat, 15 Aug 2020, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> The IOREQ code is using cmpxchg() with 64-bit value. At the moment, this
>> is x86 code, but there is plan to make it common.
>>
>> To cater 32-bit arch, introduce two new helpers to deal with 64-bit
>> cmpxchg.
>>
>> The Arm 32-bit implementation of cmpxchg64() is based on the __cmpxchg64
>> in Linux v5.8 (arch/arm/include/asm/cmpxchg.h).
>>
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>>   xen/include/asm-arm/arm32/cmpxchg.h | 68 +++++++++++++++++++++++++++++
>>   xen/include/asm-arm/arm64/cmpxchg.h |  5 +++
>>   xen/include/asm-arm/guest_atomics.h | 22 ++++++++++
>>   xen/include/asm-x86/guest_atomics.h |  2 +
>>   xen/include/asm-x86/x86_64/system.h |  2 +
>>   5 files changed, 99 insertions(+)
>>
>> diff --git a/xen/include/asm-arm/arm32/cmpxchg.h b/xen/include/asm-arm/arm32/cmpxchg.h
>> index 0770f272ee99..5e2fa6ee38a0 100644
>> --- a/xen/include/asm-arm/arm32/cmpxchg.h
>> +++ b/xen/include/asm-arm/arm32/cmpxchg.h
>> @@ -87,6 +87,38 @@ __CMPXCHG_CASE(b, 1)
>>   __CMPXCHG_CASE(h, 2)
>>   __CMPXCHG_CASE( , 4)
>>   
>> +static inline bool __cmpxchg_case_8(volatile uint64_t *ptr,
>> +			 	    uint64_t *old,
>> +			 	    uint64_t new,
>> +			 	    bool timeout,
>> +				    unsigned int max_try)
>> +{
>> +	uint64_t oldval;
>> +	uint64_t res;
>> +
>> +	do {
>> +		asm volatile(
>> +		"	ldrexd		%1, %H1, [%3]\n"
>> +		"	teq		%1, %4\n"
>> +		"	teqeq		%H1, %H4\n"
>> +		"	movne		%0, #0\n"
>> +		"	movne		%H0, #0\n"
>> +		"	bne		2f\n"
>> +		"	strexd		%0, %5, %H5, [%3]\n"
>> +		"	teq		%0, #0\n"
> 
> Apologies if I am misreading this code, but this last "teq" instruction
> doesn't seem to be useful?

Urgh, I forgot to remove it. The Linux version is looping in assembly 
but I had to convert to C in order to cater the timeout.

I will remove it in the next version.

> 
> 
>> +		"2:"
>> +		: "=&r" (res), "=&r" (oldval), "+Qo" (*ptr)
>                                                ^ not used ?
> 
> 
>> +		: "r" (ptr), "r" (*old), "r" (new)
>> +		: "memory", "cc");
>> +		if (!res)
>> +			break;
>> +	} while (!timeout || ((--max_try) > 0));
>> +
>> +	*old = oldval;
>> +
>> +	return !res;
>> +}
>> +
>>   static always_inline bool __int_cmpxchg(volatile void *ptr, unsigned long *old,
>>   					unsigned long new, int size,
>>   					bool timeout, unsigned int max_try)
>> @@ -156,6 +188,30 @@ static always_inline bool __cmpxchg_mb_timeout(volatile void *ptr,
>>   	return ret;
>>   }
>>   
>> +/*
>> + * The helper may fail to update the memory if the action takes too long.
>> + *
>> + * @old: On call the value pointed contains the expected old value. It will be
>> + * updated to the actual old value.
>> + * @max_try: Maximum number of iterations
>> + *
>> + * The helper will return true when the update has succeeded (i.e no
>> + * timeout) and false if the update has failed.
>> + */
>> +static always_inline bool __cmpxchg64_mb_timeout(volatile uint64_t *ptr,
>> +						 uint64_t *old,
>> +						 uint64_t new,
>> +						 unsigned int max_try)
>> +{
>> +	bool ret;
>> +
>> +	smp_mb();
>> +	ret = __cmpxchg_case_8(ptr, old, new, true, max_try);
>> +	smp_mb();
>> +
>> +	return ret;
>> +}
>> +
>>   #define cmpxchg(ptr,o,n)						\
>>   	((__typeof__(*(ptr)))__cmpxchg_mb((ptr),			\
>>   					  (unsigned long)(o),		\
>> @@ -167,6 +223,18 @@ static always_inline bool __cmpxchg_mb_timeout(volatile void *ptr,
>>   				       (unsigned long)(o),		\
>>   				       (unsigned long)(n),		\
>>   				       sizeof(*(ptr))))
>> +
>> +static inline uint64_t cmpxchg64(volatile uint64_t *ptr,
>> +				 uint64_t old,
>> +				 uint64_t new)
>> +{
>> +	smp_mb();
> 
> I was looking at the existing code I noticed that we don't have a
> corresponding smp_mb(); in this position. Is it needed here because of
> the 64bit-ness?

We have barriers also in the existing. The code can be a bit confusing 
because __cmpxchg() refers to a local cmpxchg.

In our case, the corresponding version if __cmpxchg_mb().

To be honest, the existing naming is a bit confusing. I am thinking to 
drop cmpxcgh_local() completely as this is not used. This would also 
make the cod easier to read. What do you think?


> 
> 
>> +	if (!__cmpxchg_case_8(ptr, &old, new, false, 0))
>> +		ASSERT_UNREACHABLE();

And I forgot the smp_mb() here :(.

>> +
>> +	return old;
>> +}
>> +
>>   #endif
>>   /*
>>    * Local variables:
>> diff --git a/xen/include/asm-arm/arm64/cmpxchg.h b/xen/include/asm-arm/arm64/cmpxchg.h
>> index fc5c60f0bd74..de9cd0ee2b07 100644
>> --- a/xen/include/asm-arm/arm64/cmpxchg.h
>> +++ b/xen/include/asm-arm/arm64/cmpxchg.h
>> @@ -187,6 +187,11 @@ static always_inline bool __cmpxchg_mb_timeout(volatile void *ptr,
>>   	__ret; \
>>   })
>>   
>> +#define cmpxchg64(ptr, o, n) cmpxchg(ptr, o, n)
>> +
>> +#define __cmpxchg64_mb_timeout(ptr, old, new, max_try) \
>> +	__cmpxchg_mb_timeout(ptr, old, new, 8, max_try)
>> +
>>   #endif
>>   /*
>>    * Local variables:
>> diff --git a/xen/include/asm-arm/guest_atomics.h b/xen/include/asm-arm/guest_atomics.h
>> index af27cc627bf3..28ce402bea79 100644
>> --- a/xen/include/asm-arm/guest_atomics.h
>> +++ b/xen/include/asm-arm/guest_atomics.h
>> @@ -115,6 +115,28 @@ static inline unsigned long __guest_cmpxchg(struct domain *d,
>>                                            (unsigned long)(n),\
>>                                            sizeof (*(ptr))))
>>   
>> +static inline uint64_t guest_cmpxchg64(struct domain *d,
>> +                                       volatile uint64_t *ptr,
>> +                                       uint64_t old,
>> +                                       uint64_t new)
>> +{
>> +    uint64_t oldval = old;
>> +
>> +    perfc_incr(atomics_guest);
>> +
>> +    if ( __cmpxchg64_mb_timeout(ptr, &oldval, new,
>> +                                this_cpu(guest_safe_atomic_max)) )
>> +        return oldval;
>> +
>> +    perfc_incr(atomics_guest_paused);
>> +
>> +    domain_pause_nosync(d);
>> +    oldval = cmpxchg64(ptr, old, new);
>> +    domain_unpause(d);
>> +
>> +    return oldval;
>> +}
>> +
>>   #endif /* _ARM_GUEST_ATOMICS_H */
>>   /*
>>    * Local variables:
>> diff --git a/xen/include/asm-x86/guest_atomics.h b/xen/include/asm-x86/guest_atomics.h
>> index 029417c8ffc1..f4de9d3631ff 100644
>> --- a/xen/include/asm-x86/guest_atomics.h
>> +++ b/xen/include/asm-x86/guest_atomics.h
>> @@ -20,6 +20,8 @@
>>       ((void)(d), test_and_change_bit(nr, p))
>>   
>>   #define guest_cmpxchg(d, ptr, o, n) ((void)(d), cmpxchg(ptr, o, n))
>> +#define guest_cmpxchg64(d, ptr, o, n) ((void)(d), cmpxchg64(ptr, o, n))
>> +
>>   
>>   #endif /* _X86_GUEST_ATOMICS_H */
>>   /*
>> diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h
>> index f471859c19cc..c1b16105e9f2 100644
>> --- a/xen/include/asm-x86/x86_64/system.h
>> +++ b/xen/include/asm-x86/x86_64/system.h
>> @@ -5,6 +5,8 @@
>>       ((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),            \
>>                                      (unsigned long)(n),sizeof(*(ptr))))
>>   
>> +#define cmpxchg64(ptr, o, n) cmpxchg(ptr, o, n)
>> +
>>   /*
>>    * Atomic 16 bytes compare and exchange.  Compare OLD with MEM, if
>>    * identical, store NEW in MEM.  Return the initial value in MEM.
>> -- 
>> 2.17.1
>>

Cheers,
Jan Beulich Aug. 19, 2020, 9:18 a.m. UTC | #11
On 17.08.2020 13:50, Roger Pau Monné wrote:
> FWIW x86 already has a specific handler for 128bit values: cmpxchg16b.
> Maybe it would be better to name this cmpxchg8b? Or rename the
> existing one to cmpxchg128 for coherence.

cmpxchg16b() is named after the underlying insn. If we gain
cmpxchg64(), then I agree this one wants renaming to cmpxchg128()
at the same time.

Jan
Jan Beulich Aug. 19, 2020, 9:22 a.m. UTC | #12
On 17.08.2020 15:03, Julien Grall wrote:
> On 17/08/2020 12:50, Roger Pau Monné wrote:
>> On Mon, Aug 17, 2020 at 12:05:54PM +0100, Julien Grall wrote:
>>> The only way I could see to make it work would be to use the same trick as
>>> we do for {read, write}_atomic() (see asm-arm/atomic.h). We are using union
>>> and void pointer to prevent explicit cast.
>>
>> I'm mostly worried about common code having assumed that cmpxchg
>> does also handle 64bit sized parameters, and thus failing to use
>> cmpxchg64 when required. I assume this is not much of a deal as then
>> the Arm 32 build would fail, so it should be fairly easy to catch
>> those.
> FWIW, this is not very different to the existing approach. If one would 
> use cmpxchg() with 64-bit, then it would fail to compile.

A somewhat related question then: Do you really need both the
guest_* and the non-guest variants? Limiting things to plain
cmpxchg() would further reduce the risk of someone picking the
wrong one without right away noticing the build issue on Arm32.
For guest_cmpxchg{,64}() I think there's less of a risk.

Jan
Julien Grall Aug. 20, 2020, 9:14 a.m. UTC | #13
On 19/08/2020 10:22, Jan Beulich wrote:
> On 17.08.2020 15:03, Julien Grall wrote:
>> On 17/08/2020 12:50, Roger Pau Monné wrote:
>>> On Mon, Aug 17, 2020 at 12:05:54PM +0100, Julien Grall wrote:
>>>> The only way I could see to make it work would be to use the same trick as
>>>> we do for {read, write}_atomic() (see asm-arm/atomic.h). We are using union
>>>> and void pointer to prevent explicit cast.
>>>
>>> I'm mostly worried about common code having assumed that cmpxchg
>>> does also handle 64bit sized parameters, and thus failing to use
>>> cmpxchg64 when required. I assume this is not much of a deal as then
>>> the Arm 32 build would fail, so it should be fairly easy to catch
>>> those.
>> FWIW, this is not very different to the existing approach. If one would
>> use cmpxchg() with 64-bit, then it would fail to compile.
> 
> A somewhat related question then: Do you really need both the
> guest_* and the non-guest variants? Limiting things to plain
> cmpxchg() would further reduce the risk of someone picking the
> wrong one without right away noticing the build issue on Arm32.
> For guest_cmpxchg{,64}() I think there's less of a risk.

For the IOREQ code, we will need the guest_* version that is built on 
top of the non-guest variant.

I would like at least consistency between the two variants. IOW, if we 
decide to use the name guest_cmpxchg64(), then I would like to use 
cmpxchg64().

I still need to explore the code generated by cmpxchg() if I include 
support for 64-bit.

Cheers,
Jan Beulich Aug. 20, 2020, 9:25 a.m. UTC | #14
On 20.08.2020 11:14, Julien Grall wrote:
> 
> 
> On 19/08/2020 10:22, Jan Beulich wrote:
>> On 17.08.2020 15:03, Julien Grall wrote:
>>> On 17/08/2020 12:50, Roger Pau Monné wrote:
>>>> On Mon, Aug 17, 2020 at 12:05:54PM +0100, Julien Grall wrote:
>>>>> The only way I could see to make it work would be to use the same trick as
>>>>> we do for {read, write}_atomic() (see asm-arm/atomic.h). We are using union
>>>>> and void pointer to prevent explicit cast.
>>>>
>>>> I'm mostly worried about common code having assumed that cmpxchg
>>>> does also handle 64bit sized parameters, and thus failing to use
>>>> cmpxchg64 when required. I assume this is not much of a deal as then
>>>> the Arm 32 build would fail, so it should be fairly easy to catch
>>>> those.
>>> FWIW, this is not very different to the existing approach. If one would
>>> use cmpxchg() with 64-bit, then it would fail to compile.
>>
>> A somewhat related question then: Do you really need both the
>> guest_* and the non-guest variants? Limiting things to plain
>> cmpxchg() would further reduce the risk of someone picking the
>> wrong one without right away noticing the build issue on Arm32.
>> For guest_cmpxchg{,64}() I think there's less of a risk.
> 
> For the IOREQ code, we will need the guest_* version that is built on 
> top of the non-guest variant.
> 
> I would like at least consistency between the two variants. IOW, if we 
> decide to use the name guest_cmpxchg64(), then I would like to use 
> cmpxchg64().

On Arm, that is. There wouldn't be any need to expose cmpxchg64()
for use in common code, and hence not at all on x86, I guess?

Jan
Julien Grall Aug. 20, 2020, 9:34 a.m. UTC | #15
On 20/08/2020 10:25, Jan Beulich wrote:
> On 20.08.2020 11:14, Julien Grall wrote:
>>
>>
>> On 19/08/2020 10:22, Jan Beulich wrote:
>>> On 17.08.2020 15:03, Julien Grall wrote:
>>>> On 17/08/2020 12:50, Roger Pau Monné wrote:
>>>>> On Mon, Aug 17, 2020 at 12:05:54PM +0100, Julien Grall wrote:
>>>>>> The only way I could see to make it work would be to use the same trick as
>>>>>> we do for {read, write}_atomic() (see asm-arm/atomic.h). We are using union
>>>>>> and void pointer to prevent explicit cast.
>>>>>
>>>>> I'm mostly worried about common code having assumed that cmpxchg
>>>>> does also handle 64bit sized parameters, and thus failing to use
>>>>> cmpxchg64 when required. I assume this is not much of a deal as then
>>>>> the Arm 32 build would fail, so it should be fairly easy to catch
>>>>> those.
>>>> FWIW, this is not very different to the existing approach. If one would
>>>> use cmpxchg() with 64-bit, then it would fail to compile.
>>>
>>> A somewhat related question then: Do you really need both the
>>> guest_* and the non-guest variants? Limiting things to plain
>>> cmpxchg() would further reduce the risk of someone picking the
>>> wrong one without right away noticing the build issue on Arm32.
>>> For guest_cmpxchg{,64}() I think there's less of a risk.
>>
>> For the IOREQ code, we will need the guest_* version that is built on
>> top of the non-guest variant.
>>
>> I would like at least consistency between the two variants. IOW, if we
>> decide to use the name guest_cmpxchg64(), then I would like to use
>> cmpxchg64().
> 
> On Arm, that is. There wouldn't be any need to expose cmpxchg64()
> for use in common code, and hence not at all on x86, I guess?

Right, we would only need to introduce guest_cmpxchg64() for common code.

Cheers,
diff mbox series

Patch

diff --git a/xen/include/asm-arm/arm32/cmpxchg.h b/xen/include/asm-arm/arm32/cmpxchg.h
index 0770f272ee99..5e2fa6ee38a0 100644
--- a/xen/include/asm-arm/arm32/cmpxchg.h
+++ b/xen/include/asm-arm/arm32/cmpxchg.h
@@ -87,6 +87,38 @@  __CMPXCHG_CASE(b, 1)
 __CMPXCHG_CASE(h, 2)
 __CMPXCHG_CASE( , 4)
 
+static inline bool __cmpxchg_case_8(volatile uint64_t *ptr,
+			 	    uint64_t *old,
+			 	    uint64_t new,
+			 	    bool timeout,
+				    unsigned int max_try)
+{
+	uint64_t oldval;
+	uint64_t res;
+
+	do {
+		asm volatile(
+		"	ldrexd		%1, %H1, [%3]\n"
+		"	teq		%1, %4\n"
+		"	teqeq		%H1, %H4\n"
+		"	movne		%0, #0\n"
+		"	movne		%H0, #0\n"
+		"	bne		2f\n"
+		"	strexd		%0, %5, %H5, [%3]\n"
+		"	teq		%0, #0\n"
+		"2:"
+		: "=&r" (res), "=&r" (oldval), "+Qo" (*ptr)
+		: "r" (ptr), "r" (*old), "r" (new)
+		: "memory", "cc");
+		if (!res)
+			break;
+	} while (!timeout || ((--max_try) > 0));
+
+	*old = oldval;
+
+	return !res;
+}
+
 static always_inline bool __int_cmpxchg(volatile void *ptr, unsigned long *old,
 					unsigned long new, int size,
 					bool timeout, unsigned int max_try)
@@ -156,6 +188,30 @@  static always_inline bool __cmpxchg_mb_timeout(volatile void *ptr,
 	return ret;
 }
 
+/*
+ * The helper may fail to update the memory if the action takes too long.
+ *
+ * @old: On call the value pointed contains the expected old value. It will be
+ * updated to the actual old value.
+ * @max_try: Maximum number of iterations
+ *
+ * The helper will return true when the update has succeeded (i.e no
+ * timeout) and false if the update has failed.
+ */
+static always_inline bool __cmpxchg64_mb_timeout(volatile uint64_t *ptr,
+						 uint64_t *old,
+						 uint64_t new,
+						 unsigned int max_try)
+{
+	bool ret;
+
+	smp_mb();
+	ret = __cmpxchg_case_8(ptr, old, new, true, max_try);
+	smp_mb();
+
+	return ret;
+}
+
 #define cmpxchg(ptr,o,n)						\
 	((__typeof__(*(ptr)))__cmpxchg_mb((ptr),			\
 					  (unsigned long)(o),		\
@@ -167,6 +223,18 @@  static always_inline bool __cmpxchg_mb_timeout(volatile void *ptr,
 				       (unsigned long)(o),		\
 				       (unsigned long)(n),		\
 				       sizeof(*(ptr))))
+
+static inline uint64_t cmpxchg64(volatile uint64_t *ptr,
+				 uint64_t old,
+				 uint64_t new)
+{
+	smp_mb();
+	if (!__cmpxchg_case_8(ptr, &old, new, false, 0))
+		ASSERT_UNREACHABLE();
+
+	return old;
+}
+
 #endif
 /*
  * Local variables:
diff --git a/xen/include/asm-arm/arm64/cmpxchg.h b/xen/include/asm-arm/arm64/cmpxchg.h
index fc5c60f0bd74..de9cd0ee2b07 100644
--- a/xen/include/asm-arm/arm64/cmpxchg.h
+++ b/xen/include/asm-arm/arm64/cmpxchg.h
@@ -187,6 +187,11 @@  static always_inline bool __cmpxchg_mb_timeout(volatile void *ptr,
 	__ret; \
 })
 
+#define cmpxchg64(ptr, o, n) cmpxchg(ptr, o, n)
+
+#define __cmpxchg64_mb_timeout(ptr, old, new, max_try) \
+	__cmpxchg_mb_timeout(ptr, old, new, 8, max_try)
+
 #endif
 /*
  * Local variables:
diff --git a/xen/include/asm-arm/guest_atomics.h b/xen/include/asm-arm/guest_atomics.h
index af27cc627bf3..28ce402bea79 100644
--- a/xen/include/asm-arm/guest_atomics.h
+++ b/xen/include/asm-arm/guest_atomics.h
@@ -115,6 +115,28 @@  static inline unsigned long __guest_cmpxchg(struct domain *d,
                                          (unsigned long)(n),\
                                          sizeof (*(ptr))))
 
+static inline uint64_t guest_cmpxchg64(struct domain *d,
+                                       volatile uint64_t *ptr,
+                                       uint64_t old,
+                                       uint64_t new)
+{
+    uint64_t oldval = old;
+
+    perfc_incr(atomics_guest);
+
+    if ( __cmpxchg64_mb_timeout(ptr, &oldval, new,
+                                this_cpu(guest_safe_atomic_max)) )
+        return oldval;
+
+    perfc_incr(atomics_guest_paused);
+
+    domain_pause_nosync(d);
+    oldval = cmpxchg64(ptr, old, new);
+    domain_unpause(d);
+
+    return oldval;
+}
+
 #endif /* _ARM_GUEST_ATOMICS_H */
 /*
  * Local variables:
diff --git a/xen/include/asm-x86/guest_atomics.h b/xen/include/asm-x86/guest_atomics.h
index 029417c8ffc1..f4de9d3631ff 100644
--- a/xen/include/asm-x86/guest_atomics.h
+++ b/xen/include/asm-x86/guest_atomics.h
@@ -20,6 +20,8 @@ 
     ((void)(d), test_and_change_bit(nr, p))
 
 #define guest_cmpxchg(d, ptr, o, n) ((void)(d), cmpxchg(ptr, o, n))
+#define guest_cmpxchg64(d, ptr, o, n) ((void)(d), cmpxchg64(ptr, o, n))
+
 
 #endif /* _X86_GUEST_ATOMICS_H */
 /*
diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h
index f471859c19cc..c1b16105e9f2 100644
--- a/xen/include/asm-x86/x86_64/system.h
+++ b/xen/include/asm-x86/x86_64/system.h
@@ -5,6 +5,8 @@ 
     ((__typeof__(*(ptr)))__cmpxchg((ptr),(unsigned long)(o),            \
                                    (unsigned long)(n),sizeof(*(ptr))))
 
+#define cmpxchg64(ptr, o, n) cmpxchg(ptr, o, n)
+
 /*
  * Atomic 16 bytes compare and exchange.  Compare OLD with MEM, if
  * identical, store NEW in MEM.  Return the initial value in MEM.