[net] net: phy: fix a signedness bug in genphy_loopback()

Message ID	d7bb312e-2428-45f6-b9b3-59ba544e8b94@kili.mountain (mailing list archive)
State	Changes Requested
Delegated to:	Netdev Maintainers
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 290DD742FE for <netdev@vger.kernel.org>; Fri, 26 May 2023 11:46:07 +0000 (UTC) Date: Fri, 26 May 2023 14:45:54 +0300 From: Dan Carpenter <dan.carpenter@linaro.org> To: Oleksij Rempel <linux@rempel-privat.de> Cc: Andrew Lunn <andrew@lunn.ch>, Heiner Kallweit <hkallweit1@gmail.com>, Russell King <linux@armlinux.org.uk>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, netdev@vger.kernel.org, kernel-janitors@vger.kernel.org Subject: [PATCH net] net: phy: fix a signedness bug in genphy_loopback() Message-ID: <d7bb312e-2428-45f6-b9b3-59ba544e8b94@kili.mountain> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline
Series	[net] net: phy: fix a signedness bug in genphy_loopback() \| expand [net] net: phy: fix a signedness bug in genphy_loopback()

Context	Check	Description
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for net
netdev/fixes_present	success	Fixes tag present in non-next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers	success	CCed 9 of 9 maintainers
netdev/build_clang	success	Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	Fixes tag looks correct
netdev/build_allmodconfig_warn	success	Errors and warnings before: 8 this patch: 8
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Dan Carpenter May 26, 2023, 11:45 a.m. UTC

The "val" variable is used to store error codes from phy_read() so
it needs to be signed for the error handling to work as expected.

Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
---
 drivers/net/phy/phy_device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Russell King (Oracle) May 26, 2023, 11:50 a.m. UTC | #1

On Fri, May 26, 2023 at 02:45:54PM +0300, Dan Carpenter wrote:
> The "val" variable is used to store error codes from phy_read() so
> it needs to be signed for the error handling to work as expected.
> 
> Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration")

LGTM.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Thanks!

Jakub Kicinski May 30, 2023, 4:58 a.m. UTC | #2

On Fri, 26 May 2023 14:45:54 +0300 Dan Carpenter wrote:
> The "val" variable is used to store error codes from phy_read() so
> it needs to be signed for the error handling to work as expected.
> 
> Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration")
> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>

Is it going to be obvious to PHY-savvy folks that the val passed to
phy_read_poll_timeout() must be an int? Is it a very common pattern?
My outsider intuition is that since regs are 16b, u16 is reasonable,
and more people may make the same mistake. Therefore we should try to
fix phy_read_poll_timeout() instead to use a local variable like it
does for __ret. 

Weaker version would be to add a compile time check to ensure val 
is signed (assert(typeof(val)~0ULL < 0) or such?).

Opinions?

Russell King (Oracle) May 30, 2023, 9:01 a.m. UTC | #3

On Mon, May 29, 2023 at 09:58:02PM -0700, Jakub Kicinski wrote:
> On Fri, 26 May 2023 14:45:54 +0300 Dan Carpenter wrote:
> > The "val" variable is used to store error codes from phy_read() so
> > it needs to be signed for the error handling to work as expected.
> > 
> > Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration")
> > Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
> 
> Is it going to be obvious to PHY-savvy folks that the val passed to
> phy_read_poll_timeout() must be an int? Is it a very common pattern?
> My outsider intuition is that since regs are 16b, u16 is reasonable,
> and more people may make the same mistake. Therefore we should try to
> fix phy_read_poll_timeout() instead to use a local variable like it
> does for __ret. 
> 
> Weaker version would be to add a compile time check to ensure val 
> is signed (assert(typeof(val)~0ULL < 0) or such?).
> 
> Opinions?

Yes, I think that would be a saner approach, since
phy_read_poll_timeout() returns the error value, rather than using
the variable passed in. Andrew?

Paolo Abeni May 30, 2023, 9:06 a.m. UTC | #4

On Mon, 2023-05-29 at 21:58 -0700, Jakub Kicinski wrote:
> On Fri, 26 May 2023 14:45:54 +0300 Dan Carpenter wrote:
> > The "val" variable is used to store error codes from phy_read() so
> > it needs to be signed for the error handling to work as expected.
> > 
> > Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration")
> > Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
> 
> Is it going to be obvious to PHY-savvy folks that the val passed to
> phy_read_poll_timeout() must be an int? Is it a very common pattern?
> My outsider intuition is that since regs are 16b, u16 is reasonable,
> and more people may make the same mistake. Therefore we should try to
> fix phy_read_poll_timeout() instead to use a local variable like it
> does for __ret. 
> 
> Weaker version would be to add a compile time check to ensure val 
> is signed (assert(typeof(val)~0ULL < 0) or such?).

FTR, a BUILD_BUG_ON() the above check spots issues in several places
(e.g. r8169_main.c, drivers/net/phy/phy_device.c, ...)

I think it should be better resort to a signed local variable in
phy_read_poll_timeout()

Thanks,

Paolo

Dan Carpenter May 30, 2023, 9:23 a.m. UTC | #5

On Tue, May 30, 2023 at 11:06:55AM +0200, Paolo Abeni wrote:
> On Mon, 2023-05-29 at 21:58 -0700, Jakub Kicinski wrote:
> > On Fri, 26 May 2023 14:45:54 +0300 Dan Carpenter wrote:
> > > The "val" variable is used to store error codes from phy_read() so
> > > it needs to be signed for the error handling to work as expected.
> > > 
> > > Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration")
> > > Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
> > 
> > Is it going to be obvious to PHY-savvy folks that the val passed to
> > phy_read_poll_timeout() must be an int? Is it a very common pattern?
> > My outsider intuition is that since regs are 16b, u16 is reasonable,
> > and more people may make the same mistake. Therefore we should try to
> > fix phy_read_poll_timeout() instead to use a local variable like it
> > does for __ret. 
> > 
> > Weaker version would be to add a compile time check to ensure val 
> > is signed (assert(typeof(val)~0ULL < 0) or such?).
> 
> FTR, a BUILD_BUG_ON() the above check spots issues in several places
> (e.g. r8169_main.c, drivers/net/phy/phy_device.c, ...)
> 

I don't see an issue in r8169_main.c and in drivers/net/phy/phy_device.c
then I only find the bug from this patch.

regards,
dan carpenter

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 6478838405a08..f05fc25b77583 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1173,6 +1173,7 @@ static inline int phy_read(struct phy_device *phydev, u32 regnum)
 ({ \
 	int __ret = read_poll_timeout(phy_read, val, val < 0 || (cond), \
 		sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
+	BUILD_BUG_ON((typeof(val))~0ULL > 0);				\
 	if (val < 0) \
 		__ret = val; \
 	if (__ret) \

Paolo Abeni May 30, 2023, 9:40 a.m. UTC | #6

On Tue, 2023-05-30 at 12:23 +0300, Dan Carpenter wrote:
> On Tue, May 30, 2023 at 11:06:55AM +0200, Paolo Abeni wrote:
> > On Mon, 2023-05-29 at 21:58 -0700, Jakub Kicinski wrote:
> > > On Fri, 26 May 2023 14:45:54 +0300 Dan Carpenter wrote:
> > > > The "val" variable is used to store error codes from phy_read() so
> > > > it needs to be signed for the error handling to work as expected.
> > > > 
> > > > Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration")
> > > > Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
> > > 
> > > Is it going to be obvious to PHY-savvy folks that the val passed to
> > > phy_read_poll_timeout() must be an int? Is it a very common pattern?
> > > My outsider intuition is that since regs are 16b, u16 is reasonable,
> > > and more people may make the same mistake. Therefore we should try to
> > > fix phy_read_poll_timeout() instead to use a local variable like it
> > > does for __ret. 
> > > 
> > > Weaker version would be to add a compile time check to ensure val 
> > > is signed (assert(typeof(val)~0ULL < 0) or such?).
> > 
> > FTR, a BUILD_BUG_ON() the above check spots issues in several places
> > (e.g. r8169_main.c, drivers/net/phy/phy_device.c, ...)
> > 
> 
> I don't see an issue in r8169_main.c and in drivers/net/phy/phy_device.c
> then I only find the bug from this patch.
> 
> regards,
> dan carpenter
> 
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 6478838405a08..f05fc25b77583 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -1173,6 +1173,7 @@ static inline int phy_read(struct phy_device *phydev, u32 regnum)
>  ({ \
>  	int __ret = read_poll_timeout(phy_read, val, val < 0 || (cond), \
>  		sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
> +	BUILD_BUG_ON((typeof(val))~0ULL > 0);				\
>  	if (val < 0) \
>  		__ret = val; \
>  	if (__ret) \
> 

Uhm... I have no idea what happened to my build host. I did see more
build errors in previous attempt, but now I only observe the one you
address with this patch. I guess some PEBKAC hit me here.

/P

Russell King (Oracle) May 30, 2023, 9:49 a.m. UTC | #7

On Tue, May 30, 2023 at 12:23:32PM +0300, Dan Carpenter wrote:
> I don't see an issue in r8169_main.c and in drivers/net/phy/phy_device.c
> then I only find the bug from this patch.

I agree - inspecting the code reveals that "val" would be of type "int".

> +	BUILD_BUG_ON((typeof(val))~0ULL > 0);				\

I've just thrown this in to my builds, and building for arm64 using
debian stable's gcc, I don't see any errors with genphy_loopback()
suitably hacked, even with r8169 included in the build.

Russell King (Oracle) May 30, 2023, 10:06 a.m. UTC | #8

On Tue, May 30, 2023 at 10:49:22AM +0100, Russell King (Oracle) wrote:
> On Tue, May 30, 2023 at 12:23:32PM +0300, Dan Carpenter wrote:
> > I don't see an issue in r8169_main.c and in drivers/net/phy/phy_device.c
> > then I only find the bug from this patch.
> 
> I agree - inspecting the code reveals that "val" would be of type "int".
> 
> > +	BUILD_BUG_ON((typeof(val))~0ULL > 0);				\
> 
> I've just thrown this in to my builds, and building for arm64 using
> debian stable's gcc, I don't see any errors with genphy_loopback()
> suitably hacked, even with r8169 included in the build.

Also successfully built with 32-bit ARM gcc.

Andrew Lunn May 30, 2023, 12:39 p.m. UTC | #9

On Mon, May 29, 2023 at 09:58:02PM -0700, Jakub Kicinski wrote:
> On Fri, 26 May 2023 14:45:54 +0300 Dan Carpenter wrote:
> > The "val" variable is used to store error codes from phy_read() so
> > it needs to be signed for the error handling to work as expected.
> > 
> > Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration")
> > Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
> 
> Is it going to be obvious to PHY-savvy folks that the val passed to
> phy_read_poll_timeout() must be an int? Is it a very common pattern?
> My outsider intuition is that since regs are 16b, u16 is reasonable,
> and more people may make the same mistake.

It is common to get this wrong in general with PHY drivers. Dan
regularly posts fixes like this soon after a PHY driver patch it
merged. I really wish we could somehow get the compiler to warn when
the result from phy_read() is stored into a unsigned type. It would
save Dan a lot of work.

> Therefore we should try to fix phy_read_poll_timeout() instead to
> use a local variable like it does for __ret.

The problem with that is val is supposed to be available to the
caller. I don't know if it is every actually used, but if it is, using
an internal signed variable and then throwing away the sign bit on
return is going to result in similar bugs.

> Weaker version would be to add a compile time check to ensure val 
> is signed (assert(typeof(val)~0ULL < 0) or such?).

I think the BUILD BUG is a better solution, since it catches all
problems. And at developer compile time, rather than at Dan runs his
static checker time.

	Andrew

Dan Carpenter May 30, 2023, 2:40 p.m. UTC | #10

On Tue, May 30, 2023 at 02:39:53PM +0200, Andrew Lunn wrote:
> On Mon, May 29, 2023 at 09:58:02PM -0700, Jakub Kicinski wrote:
> > On Fri, 26 May 2023 14:45:54 +0300 Dan Carpenter wrote:
> > > The "val" variable is used to store error codes from phy_read() so
> > > it needs to be signed for the error handling to work as expected.
> > > 
> > > Fixes: 014068dcb5b1 ("net: phy: genphy_loopback: add link speed configuration")
> > > Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
> > 
> > Is it going to be obvious to PHY-savvy folks that the val passed to
> > phy_read_poll_timeout() must be an int? Is it a very common pattern?
> > My outsider intuition is that since regs are 16b, u16 is reasonable,
> > and more people may make the same mistake.
> 
> It is common to get this wrong in general with PHY drivers. Dan
> regularly posts fixes like this soon after a PHY driver patch it
> merged. I really wish we could somehow get the compiler to warn when
> the result from phy_read() is stored into a unsigned type. It would
> save Dan a lot of work.

I don't see these as much as I used.  It's maybe once per month.  I'm
not sure why, maybe kbuild emails everyone before I see it?  GCC will
warn about this with -Wtype-limits.  Clang will also trigger a warning.

The Smatch check for this had a bug where it only warned about if
(x < 0) { if x was u32 or larger.  I fixed that bug which is why I was
looking at this code.  I will push the fix for that in a couple days.

regards,
dan carpenter

Andrew Lunn May 30, 2023, 5:06 p.m. UTC | #11

> I don't see these as much as I used.  It's maybe once per month.  I'm
> not sure why, maybe kbuild emails everyone before I see it?  GCC will
> warn about this with -Wtype-limits.  Clang will also trigger a warning.

That is interesting. Maybe we should enable that in driver/net/net and
driver/net/pcs.

	Andrew

Jakub Kicinski May 30, 2023, 7:19 p.m. UTC | #12

On Tue, 30 May 2023 14:39:53 +0200 Andrew Lunn wrote:
> > Therefore we should try to fix phy_read_poll_timeout() instead to
> > use a local variable like it does for __ret.  
> 
> The problem with that is val is supposed to be available to the
> caller. I don't know if it is every actually used, but if it is, using
> an internal signed variable and then throwing away the sign bit on
> return is going to result in similar bugs.

This is what I meant FWIW:

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 7addde5d14c0..829bd57b8794 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1206,10 +1206,13 @@ static inline int phy_read(struct phy_device *phydev, u32 regnum)
 #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
 				timeout_us, sleep_before_read) \
 ({ \
-	int __ret = read_poll_timeout(phy_read, val, val < 0 || (cond), \
+	int __ret, __val;						\
+									\
+	__ret = read_poll_timeout(phy_read, __val, __val < 0 || (cond),	\
 		sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
-	if (val < 0) \
-		__ret = val; \
+	val = __val;
+	if (__val < 0) \
+		__ret = __val; \
 	if (__ret) \
 		phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \
 	__ret; \


I tried enabling -Wtype-limits but it's _very_ noisy :(

Russell King (Oracle) May 30, 2023, 7:39 p.m. UTC | #13

On Tue, May 30, 2023 at 12:19:10PM -0700, Jakub Kicinski wrote:
> On Tue, 30 May 2023 14:39:53 +0200 Andrew Lunn wrote:
> > > Therefore we should try to fix phy_read_poll_timeout() instead to
> > > use a local variable like it does for __ret.  
> > 
> > The problem with that is val is supposed to be available to the
> > caller. I don't know if it is every actually used, but if it is, using
> > an internal signed variable and then throwing away the sign bit on
> > return is going to result in similar bugs.
> 
> This is what I meant FWIW:
> 
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 7addde5d14c0..829bd57b8794 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -1206,10 +1206,13 @@ static inline int phy_read(struct phy_device *phydev, u32 regnum)
>  #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
>  				timeout_us, sleep_before_read) \
>  ({ \
> -	int __ret = read_poll_timeout(phy_read, val, val < 0 || (cond), \
> +	int __ret, __val;						\
> +									\
> +	__ret = read_poll_timeout(phy_read, __val, __val < 0 || (cond),	\
>  		sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
> -	if (val < 0) \
> -		__ret = val; \
> +	val = __val;
> +	if (__val < 0) \
> +		__ret = __val; \
>  	if (__ret) \
>  		phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \
>  	__ret; \
> 
> 
> I tried enabling -Wtype-limits but it's _very_ noisy :(

Yes, looks good, that's what I thought you were meaning, and I totally
agree with it. Thanks!

Whatever we decide for this will also need to be applied to
phy_read_mmd_poll_timeout() as well.

Andrew Lunn May 30, 2023, 8:04 p.m. UTC | #14

> > This is what I meant FWIW:
> > 
> > diff --git a/include/linux/phy.h b/include/linux/phy.h
> > index 7addde5d14c0..829bd57b8794 100644
> > --- a/include/linux/phy.h
> > +++ b/include/linux/phy.h
> > @@ -1206,10 +1206,13 @@ static inline int phy_read(struct phy_device *phydev, u32 regnum)
> >  #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
> >  				timeout_us, sleep_before_read) \
> >  ({ \
> > -	int __ret = read_poll_timeout(phy_read, val, val < 0 || (cond), \
> > +	int __ret, __val;						\
> > +									\
> > +	__ret = read_poll_timeout(phy_read, __val, __val < 0 || (cond),	\
> >  		sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
> > -	if (val < 0) \
> > -		__ret = val; \
> > +	val = __val;

This results in the sign being discarded if val is unsigned. Yes, the
test is remove, which i assume will stop Smatch complaining, but it is
still broken.

> > +	if (__val < 0) \
> > +		__ret = __val; \
> >  	if (__ret) \
> >  		phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \
> >  	__ret; \

> > I tried enabling -Wtype-limits but it's _very_ noisy :(

This is a no go until GENMASK gets fixed :-(

However, if that is fixed, we might be able to turn it on. But it will
then trigger with this fix.

So i still think a BUILD_BUG_ON is a better fix. Help developers get
the code correct, rather than work around them getting it wrong.

I also wonder if this needs to go down a level. Dan, how often do you
see similar problems with the lower level read_poll_timeout() and
friends?

    Andrew

Russell King (Oracle) May 30, 2023, 9:09 p.m. UTC | #15

On Tue, May 30, 2023 at 10:04:52PM +0200, Andrew Lunn wrote:
> > > This is what I meant FWIW:
> > > 
> > > diff --git a/include/linux/phy.h b/include/linux/phy.h
> > > index 7addde5d14c0..829bd57b8794 100644
> > > --- a/include/linux/phy.h
> > > +++ b/include/linux/phy.h
> > > @@ -1206,10 +1206,13 @@ static inline int phy_read(struct phy_device *phydev, u32 regnum)
> > >  #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
> > >  				timeout_us, sleep_before_read) \
> > >  ({ \
> > > -	int __ret = read_poll_timeout(phy_read, val, val < 0 || (cond), \
> > > +	int __ret, __val;						\
> > > +									\
> > > +	__ret = read_poll_timeout(phy_read, __val, __val < 0 || (cond),	\
> > >  		sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
> > > -	if (val < 0) \
> > > -		__ret = val; \
> > > +	val = __val;
> 
> This results in the sign being discarded if val is unsigned. Yes, the
> test is remove, which i assume will stop Smatch complaining, but it is
> still broken.

I was going to ask you to explain that, but having thought about
this more, there's much bigger problems with the proposal.

First, if I'm understanding you correctly, your point doesn't seem
relevant, because if val is unsigned, we have an implicit cast from a
signed int to an unsigned int _at_ _some_ _point_. With the existing
code, that implicit cast is buried inside read_poll_timeout(), here
to be exact:

	(val) = op(args);

because "op" will be one of the phy_read*() functions that returns an
"int", but "val" is unsigned - which means there's an implicit cast
here. Jakub's patch moves that cast after read_poll_timeout().

The elephant in the room has nothing to do with this, but everything
to do with "cond". "cond" is an expression to be evaluated inside the
loop, which must have access to the value read from the phy_read*()
function, and that value is referenced via whatever variable was
provided via "val". So changing "val" immediately breaks "cond".

Having thought about this, the best I can come up with is this, which
I think gives us everything we want without needing BUILD_BUG_ONs:

#define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
                                timeout_us, sleep_before_read) \
({ \
        int __ret, __val;
	__ret = read_poll_timeout(__val = phy_read, val, __val < 0 || (cond), \
                sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
        if (__val < 0) \
                __ret = __val; \
        if (__ret) \
                phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \
        __ret; \
})

This looks rather horrid, but what it essentially does is:

                (val) = op(args); \
                if (cond) \
                        break; \

expands to:

		(val) = __val = phy_read(args);
		if (__val < 0 || (cond))
			break;

As phy_read() returns an int, there is no cast or loss assigning it
to __val, since that is also an int. The conversion from int to
something else happens at the same point it always has.

Hmm?

Russell King (Oracle) May 30, 2023, 9:28 p.m. UTC | #16

On Tue, May 30, 2023 at 10:09:24PM +0100, Russell King (Oracle) wrote:
> Having thought about this, the best I can come up with is this, which
> I think gives us everything we want without needing BUILD_BUG_ONs:
> 
> #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
>                                 timeout_us, sleep_before_read) \
> ({ \
>         int __ret, __val;
> 	__ret = read_poll_timeout(__val = phy_read, val, __val < 0 || (cond), \
>                 sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
>         if (__val < 0) \
>                 __ret = __val; \
>         if (__ret) \
>                 phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \
>         __ret; \
> })
> 
> This looks rather horrid, but what it essentially does is:
> 
>                 (val) = op(args); \
>                 if (cond) \
>                         break; \
> 
> expands to:
> 
> 		(val) = __val = phy_read(args);
> 		if (__val < 0 || (cond))
> 			break;
> 
> As phy_read() returns an int, there is no cast or loss assigning it
> to __val, since that is also an int. The conversion from int to
> something else happens at the same point it always has.

... and actually produces nicer code on 32-bit ARM:

Old (with the u16 val changed to an int val):

 2f8:   ebfffffe        bl      0 <mdiobus_read>
 2fc:   e7e03150        ubfx    r3, r0, #2, #1		extract bit 2 into r3
 300:   e1a04000        mov     r4, r0			save return value
 304:   e2002004        and     r2, r0, #4		extract bit 2 again
 308:   e1933fa0        orrs    r3, r3, r0, lsr #31	grab sign bit
 30c:   1a00000d        bne     348 <genphy_loopback+0xd8>
		breaks out of loop if r3 is nonzero
	... rest of loop ...
...
 348:   e3520000        cmp     r2, #0
 34c:   0a00000b        beq     380 <genphy_loopback+0x110>
		basically tests whether bit 2 was zero, and jumps if it
		was. Basically (cond) is false.

 350:   e3540000        cmp     r4, #0
 354:   a3a04000        movge   r4, #0
 358:   ba00000a        blt     388 <genphy_loopback+0x118>
		tests whether a phy_read returned an error and jumps
		if it did. r4 is basically __ret.
...

 380:   e3540000        cmp     r4, #0
 384:   a3e0406d        mvnge   r4, #109        ; 0x6d
		if r4 (__ret) was >= 0, sets an error code (-ETIMEDOUT).
 388:   e1a03004        mov     r3, r4
 ... dev_err() bit.

The new generated code is:

 2f8:   ebfffffe        bl      0 <mdiobus_read>
                        2f8: R_ARM_CALL mdiobus_read
 2fc:   e2504000        subs    r4, r0, #0		__val assignment
 300:   ba000014        blt     358 <genphy_loopback+0xe8>
		if <0, go direct to dev_err code
 304:   e3140004        tst     r4, #4			cond test within loop
 308:   1a00000d        bne     344 <genphy_loopback+0xd4>
	... rest of loop ...

 344:   e6ff4074        uxth    r4, r4			cast to 16-bit uint
 348:   e3140004        tst     r4, #4			test
 34c:   13a04000        movne   r4, #0			__ret is zero if bit set
 350:   1a000007        bne     374 <genphy_loopback+0x104> basically returns
 354:   e3e0406d        mvn     r4, #109        ; 0x6d
	... otherwise sets __ret to -ETIMEDOUT
	... dev_err() code

Is there a reason why it was written (cond) || val < 0 rather than
val < 0 || (cond) ? Note that the order of these tests makes no
difference in this situation, but I'm wondering whether it was
intentional?

[net] net: phy: fix a signedness bug in genphy_loopback()

Checks

Commit Message

Comments

Patch