diff mbox series

[XEN,1/2] xen: Fix check-endbr with mawk

Message ID 20220714143907.25938-2-anthony.perard@citrix.com (mailing list archive)
State New, archived
Headers show
Series xen: check-endbr.sh fix and improvement | expand

Commit Message

Anthony PERARD July 14, 2022, 2:39 p.m. UTC
check-endbr.sh works well with gawk, but fails with mawk. The produced
$ALL file is smaller, it is missing 0x$vma_lo on every line. On mawk,
int(0x2A) just produce 0, instead of the expected value.

The use of hexadecimal-constant in awk is an optional part of the
posix spec, and mawk doesn't seems to implemented.

There is a way to convert an hexadecimal to a number be putting it in
a string, and awk as I understand is supposed to use strtod() to
convert the string to a number when needed. The expression
'int("0x15") + 21' would produce the expected value in `mawk` but now
`gawk` won't convert the string to a number unless we use the option
"--non-decimal-data".

So let's convert the hexadecimal number before using it in the awk
script. The shell as no issue with dealing with hexadecimal-constant
so we'll simply use the expression "$(( 0x15 ))" to convert the value
before using it in awk.

Fixes: 4d037425dc ("x86: Build check for embedded endbr64 instructions")
Reported-by: Luca Fancellu <Luca.Fancellu@arm.com>
Reported-by: Mathieu Tarral <mathieu.tarral@protonmail.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
---
 xen/tools/check-endbr.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Bertrand Marquis July 14, 2022, 2:57 p.m. UTC | #1
Hi Anthony,

> On 14 Jul 2022, at 15:39, Anthony PERARD <anthony.perard@citrix.com> wrote:
> 
> check-endbr.sh works well with gawk, but fails with mawk. The produced
> $ALL file is smaller, it is missing 0x$vma_lo on every line. On mawk,
> int(0x2A) just produce 0, instead of the expected value.
> 
> The use of hexadecimal-constant in awk is an optional part of the
> posix spec, and mawk doesn't seems to implemented.
> 
> There is a way to convert an hexadecimal to a number be putting it in
> a string, and awk as I understand is supposed to use strtod() to
> convert the string to a number when needed. The expression
> 'int("0x15") + 21' would produce the expected value in `mawk` but now
> `gawk` won't convert the string to a number unless we use the option
> "--non-decimal-data".
> 
> So let's convert the hexadecimal number before using it in the awk
> script. The shell as no issue with dealing with hexadecimal-constant
> so we'll simply use the expression "$(( 0x15 ))" to convert the value
> before using it in awk.
> 
> Fixes: 4d037425dc ("x86: Build check for embedded endbr64 instructions")
> Reported-by: Luca Fancellu <Luca.Fancellu@arm.com>
> Reported-by: Mathieu Tarral <mathieu.tarral@protonmail.com>
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>

Very nice solution

Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>

Cheers
Bertrand
Andrew Cooper July 14, 2022, 3:12 p.m. UTC | #2
On 14/07/2022 15:39, Anthony PERARD wrote:
> check-endbr.sh works well with gawk, but fails with mawk. The produced
> $ALL file is smaller, it is missing 0x$vma_lo on every line. On mawk,
> int(0x2A) just produce 0, instead of the expected value.
>
> The use of hexadecimal-constant in awk is an optional part of the
> posix spec, and mawk doesn't seems to implemented.
>
> There is a way to convert an hexadecimal to a number be putting it in
> a string, and awk as I understand is supposed to use strtod() to
> convert the string to a number when needed. The expression
> 'int("0x15") + 21' would produce the expected value in `mawk` but now
> `gawk` won't convert the string to a number unless we use the option
> "--non-decimal-data".
>
> So let's convert the hexadecimal number before using it in the awk
> script. The shell as no issue with dealing with hexadecimal-constant
> so we'll simply use the expression "$(( 0x15 ))" to convert the value
> before using it in awk.
>
> Fixes: 4d037425dc ("x86: Build check for embedded endbr64 instructions")
> Reported-by: Luca Fancellu <Luca.Fancellu@arm.com>
> Reported-by: Mathieu Tarral <mathieu.tarral@protonmail.com>
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>

Thanks for doing this.  You beat me to it.

On policy first, we have https://gitlab.com/xen-project/xen/-/issues/26
open for tracking this bug.

We should consider having

Resolves xen-project/xen#26

in our list of tags, so Gitlab can properly cross-reference this fix. 
(I wonder if Resolves: works...)

https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically
is the full list of patterns available, but I think we want to keep
Fixes: for it's current meaning.


I also want to wait for the patchew CI run to complete because we've got
several build environments which have been a fertile source of shell
related bugs.

~Andrew
Andrew Cooper July 14, 2022, 6:24 p.m. UTC | #3
On 14/07/2022 15:39, Anthony PERARD wrote:
> diff --git a/xen/tools/check-endbr.sh b/xen/tools/check-endbr.sh
> index 552f233912..64fa9a56b7 100755
> --- a/xen/tools/check-endbr.sh
> +++ b/xen/tools/check-endbr.sh
> @@ -78,7 +78,7 @@ then
>  else
>      grep -aob -e "$(printf '\363\17\36\372')" -e "$(printf '\363\17\36\373')" \
>           -e "$(printf '\146\17\37\1')" $TEXT_BIN
> -fi | awk -F':' '{printf "%s%x\n", "'$vma_hi'", int(0x'$vma_lo') + $1}' > $ALL
> +fi | awk -F':' '{printf "%s%x\n", "'$vma_hi'", int('$((0x$vma_lo))') + $1}' > $ALL
>  
>  # Wait for $VALID to become complete
>  wait

I thought I'd found a cunning way to simply this, but alas.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>, but this
warrants a comment, so I've added this hunk too:

diff --git a/xen/tools/check-endbr.sh b/xen/tools/check-endbr.sh
index f633846b0f79..80955f74c71c 100755
--- a/xen/tools/check-endbr.sh
+++ b/xen/tools/check-endbr.sh
@@ -64,6 +64,11 @@ ${OBJDUMP} -j .text $1 -d -w | grep '        endbr64
*$' | cut -f 1 -d ':' > $VALID &
 #    split the VMA in half so AWK's numeric addition is only working on
32 bit
 #    numbers, which don't lose precision.
 #
+# 4) MAWK doesn't support plain hex constants (an optional part of the
POSIX
+#    spec), and GAWK and MAWK can't agree on how to work with hex constants
+#    in a string.  Use the shell to convert $vma_lo to decimal before
passing
+#    to AWK.
+#
 eval $(${OBJDUMP} -j .text $1 -h |
     $AWK '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1,
8), substr($4, 9, 16)}')
Andrew Cooper July 14, 2022, 6:59 p.m. UTC | #4
On 14/07/2022 16:12, Andrew Cooper wrote:
> On 14/07/2022 15:39, Anthony PERARD wrote:
>> check-endbr.sh works well with gawk, but fails with mawk. The produced
>> $ALL file is smaller, it is missing 0x$vma_lo on every line. On mawk,
>> int(0x2A) just produce 0, instead of the expected value.
>>
>> The use of hexadecimal-constant in awk is an optional part of the
>> posix spec, and mawk doesn't seems to implemented.
>>
>> There is a way to convert an hexadecimal to a number be putting it in
>> a string, and awk as I understand is supposed to use strtod() to
>> convert the string to a number when needed. The expression
>> 'int("0x15") + 21' would produce the expected value in `mawk` but now
>> `gawk` won't convert the string to a number unless we use the option
>> "--non-decimal-data".
>>
>> So let's convert the hexadecimal number before using it in the awk
>> script. The shell as no issue with dealing with hexadecimal-constant
>> so we'll simply use the expression "$(( 0x15 ))" to convert the value
>> before using it in awk.
>>
>> Fixes: 4d037425dc ("x86: Build check for embedded endbr64 instructions")
>> Reported-by: Luca Fancellu <Luca.Fancellu@arm.com>
>> Reported-by: Mathieu Tarral <mathieu.tarral@protonmail.com>
>> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> Thanks for doing this.  You beat me to it.
>
> On policy first, we have https://gitlab.com/xen-project/xen/-/issues/26
> open for tracking this bug.
>
> We should consider having
>
> Resolves xen-project/xen#26
>
> in our list of tags, so Gitlab can properly cross-reference this fix. 
> (I wonder if Resolves: works...)

Yes it does.  Gitlab successfully cross-referenced my dev branch ...

>
> https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically
> is the full list of patterns available, but I think we want to keep
> Fixes: for it's current meaning.
>
>
> I also want to wait for the patchew CI run to complete

... pushed because patchew failed to pick the series up for some reason.

~Andrew
Jan Beulich July 15, 2022, 10:14 a.m. UTC | #5
On 14.07.2022 16:39, Anthony PERARD wrote:
> --- a/xen/tools/check-endbr.sh
> +++ b/xen/tools/check-endbr.sh
> @@ -78,7 +78,7 @@ then
>  else
>      grep -aob -e "$(printf '\363\17\36\372')" -e "$(printf '\363\17\36\373')" \
>           -e "$(printf '\146\17\37\1')" $TEXT_BIN
> -fi | awk -F':' '{printf "%s%x\n", "'$vma_hi'", int(0x'$vma_lo') + $1}' > $ALL
> +fi | awk -F':' '{printf "%s%x\n", "'$vma_hi'", int('$((0x$vma_lo))') + $1}' > $ALL

I'm afraid that's not portable to environments where sizeof(long) < 8.
The shell isn't required to use wider than long for the evaluation.

Jan
Andrew Cooper July 15, 2022, 10:35 a.m. UTC | #6
On 14/07/2022 19:59, Andrew Cooper wrote:
> On 14/07/2022 16:12, Andrew Cooper wrote:
>> On 14/07/2022 15:39, Anthony PERARD wrote:
>>> check-endbr.sh works well with gawk, but fails with mawk. The produced
>>> $ALL file is smaller, it is missing 0x$vma_lo on every line. On mawk,
>>> int(0x2A) just produce 0, instead of the expected value.
>>>
>>> The use of hexadecimal-constant in awk is an optional part of the
>>> posix spec, and mawk doesn't seems to implemented.
>>>
>>> There is a way to convert an hexadecimal to a number be putting it in
>>> a string, and awk as I understand is supposed to use strtod() to
>>> convert the string to a number when needed. The expression
>>> 'int("0x15") + 21' would produce the expected value in `mawk` but now
>>> `gawk` won't convert the string to a number unless we use the option
>>> "--non-decimal-data".
>>>
>>> So let's convert the hexadecimal number before using it in the awk
>>> script. The shell as no issue with dealing with hexadecimal-constant
>>> so we'll simply use the expression "$(( 0x15 ))" to convert the value
>>> before using it in awk.
>>>
>>> Fixes: 4d037425dc ("x86: Build check for embedded endbr64 instructions")
>>> Reported-by: Luca Fancellu <Luca.Fancellu@arm.com>
>>> Reported-by: Mathieu Tarral <mathieu.tarral@protonmail.com>
>>> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
>> Thanks for doing this.  You beat me to it.
>>
>> On policy first, we have https://gitlab.com/xen-project/xen/-/issues/26
>> open for tracking this bug.
>>
>> We should consider having
>>
>> Resolves xen-project/xen#26
>>
>> in our list of tags, so Gitlab can properly cross-reference this fix. 
>> (I wonder if Resolves: works...)
> Yes it does.  Gitlab successfully cross-referenced my dev branch ...
>
>> https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically
>> is the full list of patterns available, but I think we want to keep
>> Fixes: for it's current meaning.
>>
>>
>> I also want to wait for the patchew CI run to complete
> ... pushed because patchew failed to pick the series up for some reason.

This series is now fully acked/reviewed and ready, and passed CI (well -
the bits of CI which aren't broken for other reasons).

Given the lack of objections, I'm going to use this patch alone as an
experiment to see how Resolves: works through other bits of our workflow
too.  Unless someone objects very promptly.

~Andrew
Andrew Cooper July 15, 2022, 10:43 a.m. UTC | #7
On 15/07/2022 11:14, Jan Beulich wrote:
> On 14.07.2022 16:39, Anthony PERARD wrote:
>> --- a/xen/tools/check-endbr.sh
>> +++ b/xen/tools/check-endbr.sh
>> @@ -78,7 +78,7 @@ then
>>  else
>>      grep -aob -e "$(printf '\363\17\36\372')" -e "$(printf '\363\17\36\373')" \
>>           -e "$(printf '\146\17\37\1')" $TEXT_BIN
>> -fi | awk -F':' '{printf "%s%x\n", "'$vma_hi'", int(0x'$vma_lo') + $1}' > $ALL
>> +fi | awk -F':' '{printf "%s%x\n", "'$vma_hi'", int('$((0x$vma_lo))') + $1}' > $ALL
> I'm afraid that's not portable to environments where sizeof(long) < 8.
> The shell isn't required to use wider than long for the evaluation.

Yuck.  This works at the moment in 32 bit builds because:

++ objdump -j .text xen-syms -h
++ awk '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1,
8), substr($4, 9, 16)}'
+ eval vma_hi=ffff82d0 vma_lo=40200000
++ vma_hi=ffff82d0
++ vma_lo=40200000

so the top bit isn't set.

And going from an 8/8 split to a 9/7 split doesn't work either because
that uses 4 bits and we've only got 2 to play with given .text's 1G
alignment.

I know it's disgusting, but how about a BUILD_BUG_ON(XEN_VIRT_START &
(1u << 31)) ?

~Andrew
Jan Beulich July 15, 2022, 10:51 a.m. UTC | #8
On 15.07.2022 12:43, Andrew Cooper wrote:
> On 15/07/2022 11:14, Jan Beulich wrote:
>> On 14.07.2022 16:39, Anthony PERARD wrote:
>>> --- a/xen/tools/check-endbr.sh
>>> +++ b/xen/tools/check-endbr.sh
>>> @@ -78,7 +78,7 @@ then
>>>  else
>>>      grep -aob -e "$(printf '\363\17\36\372')" -e "$(printf '\363\17\36\373')" \
>>>           -e "$(printf '\146\17\37\1')" $TEXT_BIN
>>> -fi | awk -F':' '{printf "%s%x\n", "'$vma_hi'", int(0x'$vma_lo') + $1}' > $ALL
>>> +fi | awk -F':' '{printf "%s%x\n", "'$vma_hi'", int('$((0x$vma_lo))') + $1}' > $ALL
>> I'm afraid that's not portable to environments where sizeof(long) < 8.
>> The shell isn't required to use wider than long for the evaluation.
> 
> Yuck.  This works at the moment in 32 bit builds because:
> 
> ++ objdump -j .text xen-syms -h
> ++ awk '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1,
> 8), substr($4, 9, 16)}'
> + eval vma_hi=ffff82d0 vma_lo=40200000
> ++ vma_hi=ffff82d0
> ++ vma_lo=40200000
> 
> so the top bit isn't set.
> 
> And going from an 8/8 split to a 9/7 split doesn't work either because
> that uses 4 bits and we've only got 2 to play with given .text's 1G
> alignment.

Why does text alignment matter here? All we care about are offsets
into the Xen image. An I guess we aren't really at risk of going
beyond 256M in image size ...

> I know it's disgusting, but how about a BUILD_BUG_ON(XEN_VIRT_START &
> (1u << 31)) ?

In the worst case, why not. But that would be an odd restriction on
changes to the memory layout (which we've done in the past).

Jan
Andrew Cooper July 15, 2022, 11:04 a.m. UTC | #9
On 15/07/2022 11:51, Jan Beulich wrote:
> On 15.07.2022 12:43, Andrew Cooper wrote:
>> On 15/07/2022 11:14, Jan Beulich wrote:
>>> On 14.07.2022 16:39, Anthony PERARD wrote:
>>>> --- a/xen/tools/check-endbr.sh
>>>> +++ b/xen/tools/check-endbr.sh
>>>> @@ -78,7 +78,7 @@ then
>>>>  else
>>>>      grep -aob -e "$(printf '\363\17\36\372')" -e "$(printf '\363\17\36\373')" \
>>>>           -e "$(printf '\146\17\37\1')" $TEXT_BIN
>>>> -fi | awk -F':' '{printf "%s%x\n", "'$vma_hi'", int(0x'$vma_lo') + $1}' > $ALL
>>>> +fi | awk -F':' '{printf "%s%x\n", "'$vma_hi'", int('$((0x$vma_lo))') + $1}' > $ALL
>>> I'm afraid that's not portable to environments where sizeof(long) < 8.
>>> The shell isn't required to use wider than long for the evaluation.
>> Yuck.  This works at the moment in 32 bit builds because:
>>
>> ++ objdump -j .text xen-syms -h
>> ++ awk '$2 == ".text" {printf "vma_hi=%s\nvma_lo=%s\n", substr($4, 1,
>> 8), substr($4, 9, 16)}'
>> + eval vma_hi=ffff82d0 vma_lo=40200000
>> ++ vma_hi=ffff82d0
>> ++ vma_lo=40200000
>>
>> so the top bit isn't set.
>>
>> And going from an 8/8 split to a 9/7 split doesn't work either because
>> that uses 4 bits and we've only got 2 to play with given .text's 1G
>> alignment.
> Why does text alignment matter here? All we care about are offsets
> into the Xen image. An I guess we aren't really at risk of going
> beyond 256M in image size ...

Very good point, but it's not even the image size.  It's only .text.

Furthermore, we can have the safety check for that in this script too.

Lemme see what I can do.

>> I know it's disgusting, but how about a BUILD_BUG_ON(XEN_VIRT_START &
>> (1u << 31)) ?
> In the worst case, why not. But that would be an odd restriction on
> changes to the memory layout (which we've done in the past).

I did say disgusting...  I'm entirely happy that it doesn't appear to be
needed.

~Andrew
diff mbox series

Patch

diff --git a/xen/tools/check-endbr.sh b/xen/tools/check-endbr.sh
index 552f233912..64fa9a56b7 100755
--- a/xen/tools/check-endbr.sh
+++ b/xen/tools/check-endbr.sh
@@ -78,7 +78,7 @@  then
 else
     grep -aob -e "$(printf '\363\17\36\372')" -e "$(printf '\363\17\36\373')" \
          -e "$(printf '\146\17\37\1')" $TEXT_BIN
-fi | awk -F':' '{printf "%s%x\n", "'$vma_hi'", int(0x'$vma_lo') + $1}' > $ALL
+fi | awk -F':' '{printf "%s%x\n", "'$vma_hi'", int('$((0x$vma_lo))') + $1}' > $ALL
 
 # Wait for $VALID to become complete
 wait