diff mbox series

[net-next] net: phy: aquantia: drop wrong endianness conversion for addr and CRC

Message ID 20231122170813.1222-1-ansuelsmth@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net: phy: aquantia: drop wrong endianness conversion for addr and CRC | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/codegen success Generated files up to date
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1127 this patch: 1127
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 1154 this patch: 1154
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1154 this patch: 1154
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 20 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Christian Marangi Nov. 22, 2023, 5:08 p.m. UTC
On further testing on BE target with kernel test robot, it was notice
that the endianness conversion for addr and CRC in fw_load_memory was
wrong and actually not needed. Values in define doesn't get converted
and are passed as is and hardcoded values are already in what the PHY
require, that is LE.

Also drop the cpu_to_be32 for CRC calculation as it's wrong and use
_swab32 instead, the word is taked from firmware and is always LE, the
mailbox will emit a BE CRC hence the word needs to be always swapped and
the endianness of the host needs to be ignored.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202311210414.sEJZjlcD-lkp@intel.com/
Fixes: e93984ebc1c8 ("net: phy: aquantia: add firmware load support")
Tested-by: Robert Marko <robimarko@gmail.com> # ipq8072 LE device
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
Replacement of net: phy: aquantia: make mailbox interface4 lsw addr
mask more specific

 drivers/net/phy/aquantia/aquantia_firmware.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Russell King (Oracle) Nov. 22, 2023, 5:24 p.m. UTC | #1
On Wed, Nov 22, 2023 at 06:08:13PM +0100, Christian Marangi wrote:
> On further testing on BE target with kernel test robot, it was notice
> that the endianness conversion for addr and CRC in fw_load_memory was
> wrong and actually not needed. Values in define doesn't get converted
> and are passed as is and hardcoded values are already in what the PHY
> require, that is LE.
> 
> Also drop the cpu_to_be32 for CRC calculation as it's wrong and use
> _swab32 instead, the word is taked from firmware and is always LE, the

                               taken

> mailbox will emit a BE CRC hence the word needs to be always swapped and
> the endianness of the host needs to be ignored.

I'm not convinced. If the firmware is a bytestream (as most "files" are)
then for val = get_unaligned((u32 *)ptr), where ptr is an array of u8:

ptr[0]	ptr[1]	ptr[2]	ptr[3]	val on LE	val on BE
0x01	0x02	0x03	0x04	0x04030201	0x01020304

So, endianness matters here, and I think as Jakub already suggested, you
need to use get_unaligned_le32().

> diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c
> index c5f292b1c4c8..bd093633d0cf 100644
> --- a/drivers/net/phy/aquantia/aquantia_firmware.c
> +++ b/drivers/net/phy/aquantia/aquantia_firmware.c
> @@ -93,9 +93,9 @@ static int aqr_fw_load_memory(struct phy_device *phydev, u32 addr,
>  	u16 crc = 0, up_crc;
>  	size_t pos;
>  
> -	/* PHY expect addr in LE */
> -	addr = (__force u32)cpu_to_le32(addr);
> -
> +	/* PHY expect addr in LE. Hardcoded addr in defines are
> +	 * already in this format.
> +	 */
>  	phy_write_mmd(phydev, MDIO_MMD_VEND1,
>  		      VEND1_GLOBAL_MAILBOX_INTERFACE1,
>  		      VEND1_GLOBAL_MAILBOX_INTERFACE1_CRC_RESET);
> @@ -128,7 +128,7 @@ static int aqr_fw_load_memory(struct phy_device *phydev, u32 addr,
>  		 * We convert word to big-endian as PHY is BE and mailbox will
>  		 * return a BE CRC.
>  		 */
> -		word = (__force u32)cpu_to_be32(word);
> +		word = __swab32(word);
>  		crc = crc_ccitt_false(crc, (u8 *)&word, sizeof(word));

Again, I think you need to be careful with the endianness here again.
From what I understand here, it seems the CRC needs to be generated by
looking at the byte at ptr[3] first, then ptr[2], ptr[1] and finally
ptr[0] ?

If that is the case, the problem is using __swab32() on LE will do the
job for you, but on BE machines, it will be wrong.

I would make this explicit:

		u8 crc_data[4];

		...

		/* CRC is calculated using BE order */
		crc_data[0] = word >> 24;
		crc_data[1] = word >> 16;
		crc_data[2] = word >> 8;
		crc_data[3] = word;

		crc = crc_ccitt_false(crc, crc_data, sizeof(crc_data));

which will be (a) completely unambiguous, and (b) completely
independent of the host endianness.
Christian Marangi Nov. 22, 2023, 5:53 p.m. UTC | #2
On Wed, Nov 22, 2023 at 05:24:33PM +0000, Russell King (Oracle) wrote:
> On Wed, Nov 22, 2023 at 06:08:13PM +0100, Christian Marangi wrote:
> > On further testing on BE target with kernel test robot, it was notice
> > that the endianness conversion for addr and CRC in fw_load_memory was
> > wrong and actually not needed. Values in define doesn't get converted
> > and are passed as is and hardcoded values are already in what the PHY
> > require, that is LE.
> > 
> > Also drop the cpu_to_be32 for CRC calculation as it's wrong and use
> > _swab32 instead, the word is taked from firmware and is always LE, the
> 
>                                taken
> 
> > mailbox will emit a BE CRC hence the word needs to be always swapped and
> > the endianness of the host needs to be ignored.
> 
> I'm not convinced. If the firmware is a bytestream (as most "files" are)
> then for val = get_unaligned((u32 *)ptr), where ptr is an array of u8:
> 
> ptr[0]	ptr[1]	ptr[2]	ptr[3]	val on LE	val on BE
> 0x01	0x02	0x03	0x04	0x04030201	0x01020304
> 
> So, endianness matters here, and I think as Jakub already suggested, you
> need to use get_unaligned_le32().
>

So they DO get converted to the HOST endian on reading the firmware from
an nvmem cell or a filesystem?

Again this is really dumping raw data from the read file directly to the
mailbox. Unless phy_write does some conversion internally, but in that
case how does it know what endian is the PHY internally?

> > diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c
> > index c5f292b1c4c8..bd093633d0cf 100644
> > --- a/drivers/net/phy/aquantia/aquantia_firmware.c
> > +++ b/drivers/net/phy/aquantia/aquantia_firmware.c
> > @@ -93,9 +93,9 @@ static int aqr_fw_load_memory(struct phy_device *phydev, u32 addr,
> >  	u16 crc = 0, up_crc;
> >  	size_t pos;
> >  
> > -	/* PHY expect addr in LE */
> > -	addr = (__force u32)cpu_to_le32(addr);
> > -
> > +	/* PHY expect addr in LE. Hardcoded addr in defines are
> > +	 * already in this format.
> > +	 */
> >  	phy_write_mmd(phydev, MDIO_MMD_VEND1,
> >  		      VEND1_GLOBAL_MAILBOX_INTERFACE1,
> >  		      VEND1_GLOBAL_MAILBOX_INTERFACE1_CRC_RESET);
> > @@ -128,7 +128,7 @@ static int aqr_fw_load_memory(struct phy_device *phydev, u32 addr,
> >  		 * We convert word to big-endian as PHY is BE and mailbox will
> >  		 * return a BE CRC.
> >  		 */
> > -		word = (__force u32)cpu_to_be32(word);
> > +		word = __swab32(word);
> >  		crc = crc_ccitt_false(crc, (u8 *)&word, sizeof(word));
> 
> Again, I think you need to be careful with the endianness here again.
> From what I understand here, it seems the CRC needs to be generated by
> looking at the byte at ptr[3] first, then ptr[2], ptr[1] and finally
> ptr[0] ?
> 
> If that is the case, the problem is using __swab32() on LE will do the
> job for you, but on BE machines, it will be wrong.
> 
> I would make this explicit:
> 
> 		u8 crc_data[4];
> 
> 		...
> 
> 		/* CRC is calculated using BE order */
> 		crc_data[0] = word >> 24;
> 		crc_data[1] = word >> 16;
> 		crc_data[2] = word >> 8;
> 		crc_data[3] = word;
> 
> 		crc = crc_ccitt_false(crc, crc_data, sizeof(crc_data));
> 
> which will be (a) completely unambiguous, and (b) completely
> independent of the host endianness.

But isn't this exactly what is done with ___constant_swab32 ?
__swab32 should not change if the HOST is BE or LE.

The real question is if word is converted. (by either the read API on
reading the FW or by phy_write on writing the thing to mailbox) (the
test are done on a LE HOST)

Our theory is that mailbox takes LE and internally converts to BE (as
the PHY is BE) but the CRC reg calculates the CRC out of the converted
data aka it does calculates the CRC from the BE data (converted
internally).
Jakub Kicinski Nov. 22, 2023, 6:23 p.m. UTC | #3
On Wed, 22 Nov 2023 18:53:39 +0100 Christian Marangi wrote:
> So they DO get converted to the HOST endian on reading the firmware from
> an nvmem cell or a filesystem?

They don't get converted when "reading from nvmem / fs". 
They get converted when you do:

		word = get_unaligned((const u32 *)(data + pos));

get_unaligned() is basically:

#if BIGENDIAN
#define		get_unaligned	get_unaligned_be32
#else
#define		get_unaligned	get_unaligned_le32
#endif

so you'll get different behavior here depending on the CPU.
Christian Marangi Nov. 22, 2023, 6:39 p.m. UTC | #4
On Wed, Nov 22, 2023 at 10:23:47AM -0800, Jakub Kicinski wrote:
> On Wed, 22 Nov 2023 18:53:39 +0100 Christian Marangi wrote:
> > So they DO get converted to the HOST endian on reading the firmware from
> > an nvmem cell or a filesystem?
> 
> They don't get converted when "reading from nvmem / fs". 
> They get converted when you do:
> 
> 		word = get_unaligned((const u32 *)(data + pos));
> 
> get_unaligned() is basically:
> 
> #if BIGENDIAN
> #define		get_unaligned	get_unaligned_be32
> #else
> #define		get_unaligned	get_unaligned_le32
> #endif
> 
> so you'll get different behavior here depending on the CPU.

Ugh... If that is true this is bad...

When get_unaligned was suggested, I checked if the thing was doing any
kind of conversion and from [1] I tought it was just getting the
pointer.

I can't find the entry where the thing is done. Is this some kind of
include magic with asm specific API?

[1] https://elixir.bootlin.com/linux/latest/source/include/asm-generic/unaligned.h#L22
Russell King (Oracle) Nov. 22, 2023, 6:53 p.m. UTC | #5
On Wed, Nov 22, 2023 at 06:53:39PM +0100, Christian Marangi wrote:
> On Wed, Nov 22, 2023 at 05:24:33PM +0000, Russell King (Oracle) wrote:
> > On Wed, Nov 22, 2023 at 06:08:13PM +0100, Christian Marangi wrote:
> > > On further testing on BE target with kernel test robot, it was notice
> > > that the endianness conversion for addr and CRC in fw_load_memory was
> > > wrong and actually not needed. Values in define doesn't get converted
> > > and are passed as is and hardcoded values are already in what the PHY
> > > require, that is LE.
> > > 
> > > Also drop the cpu_to_be32 for CRC calculation as it's wrong and use
> > > _swab32 instead, the word is taked from firmware and is always LE, the
> > 
> >                                taken
> > 
> > > mailbox will emit a BE CRC hence the word needs to be always swapped and
> > > the endianness of the host needs to be ignored.
> > 
> > I'm not convinced. If the firmware is a bytestream (as most "files" are)
> > then for val = get_unaligned((u32 *)ptr), where ptr is an array of u8:
> > 
> > ptr[0]	ptr[1]	ptr[2]	ptr[3]	val on LE	val on BE
> > 0x01	0x02	0x03	0x04	0x04030201	0x01020304
> > 
> > So, endianness matters here, and I think as Jakub already suggested, you
> > need to use get_unaligned_le32().
> >
> 
> So they DO get converted to the HOST endian on reading the firmware from
> an nvmem cell or a filesystem?

I don't like "converted". It's *not* a conversion. It's a fundamental
property of accessing memory using different sizes of access.

As I attempted to explain above, if you have a file, and byte 0
contains 0xAA, byte 1 of the file contains 0xBB, byte 2 contains
0xCC, and byte 3 contains 0xDD, then if you read that file byte by
byte, you will get 0xAA, then 0xBB, then 0xCC and then 0xDD.

If you map that file into memory, e.g. in userspace, using mmap(),
or allocating memory and reading four bytes into memory, and access
it using bytes, then at offset 0, you will find 0xAA, offset 1 will
be 0xBB, etc.

The problems with endianness start when you move away from byte
access.

If you use 16-bit accessors, then, a little endian machine is defined
that a 16-bit load from memory will result in the first byte being put
into the LSB of the 16-bit value, and the second byte will be put into
the MSB of the 16-bit value. So that would be 0xBBAA. However, on a big
endian machine, a 16-bit load will result in the first byte being put
into the MSB of the 16-bit value, and the second byte will be put into
the LSB of that value - meaning the 16-bit value will be 0xAABB.

The second 16-bit value uses the next two bytes, and the order at which
these two bytes are placed into the 16-bit value reflects the same as
the first two bytes. So LE will be 0xDDCC and BE would be 0xCCDD.

The same "swapping" happens with 32-bit, but of course instead of just
two bytes, it covers four bytes. On LE, a 32-bit access will give
0xDDCCBBAA. On BE, that will be 0xAABBCCDD.

Again, this is not to do with any kind of "conversion" happening in
software. It's a property of how the memory subsystem inside the CPU
works.

> Again this is really dumping raw data from the read file directly to the
> mailbox. Unless phy_write does some conversion internally, but in that
> case how does it know what endian is the PHY internally?

phy_write() does *no* conversion. The MDIO bus defines that a 16-bit
register value will be transferred, and the MDIO bus specifies that
bit 15 will be sent first, followed by subsequent bits down to bit 0.

The access to the hardware to make this happen is required to ensure
that the value passed to phy_write() and read using phy_read() will
reflect this. So, if one does this:

	val = phy_read(phydev, 0);

	for (i = 15; i >= 0; i--)
		printk("%u", !!(val & BIT(i)));

	printk("\n");

This will give you the stream of bits in the _order_ that they appeared
on the MDIO bus when phy_read() accessed. Doing the same with a value
to be written will produce the bits in the same value that they will
be placed on the MDIO bus.

So, this means that if the BMCR contains 0x1234 in the PHY, phy_read()
will return 0x1234. Passing 0x1234 into phy_write() will write 0x1234
in that register. The host endian is entirely irrelevant here.

> > I would make this explicit:
> > 
> > 		u8 crc_data[4];
> > 
> > 		...
> > 
> > 		/* CRC is calculated using BE order */
> > 		crc_data[0] = word >> 24;
> > 		crc_data[1] = word >> 16;
> > 		crc_data[2] = word >> 8;
> > 		crc_data[3] = word;
> > 
> > 		crc = crc_ccitt_false(crc, crc_data, sizeof(crc_data));
> > 
> > which will be (a) completely unambiguous, and (b) completely
> > independent of the host endianness.
> 
> But isn't this exactly what is done with ___constant_swab32 ?
> __swab32 should not change if the HOST is BE or LE.

Let try again to make this clear. If one has this code:

		u32 word = 0x01020304;
		u8 *ptr;
		int i;

		ptr = (u8 *)&word;

		for (i = 0; i < 4; i++)
			printk(" %02x", ptr[i]);
		printk("\n");

Then, on a:
- LE machine, this will print " 04 03 02 01"
- BE machine, this will print " 01 02 03 04"

Now, if you look at the definition of crc_ccitt_false(), it is
defined to do:

        while (len--)
                crc = crc_ccitt_false_byte(crc, *buffer++);

So, on a LE machine, this will feed the above bytes in the order of
0x04, 0x03, 0x02, 0x01 in a LE machine, and 0x01, 0x02, 0x03, 0x04
on a BE machine.

> The real question is if word is converted. (by either the read API on
> reading the FW or by phy_write on writing the thing to mailbox) (the
> test are done on a LE HOST)

There are no conversions - where a conversion I define as something
that the software explicitly has to do rather than what the underlying
machine hardware does.

> Our theory is that mailbox takes LE and internally converts to BE (as
> the PHY is BE) but the CRC reg calculates the CRC out of the converted
> data aka it does calculates the CRC from the BE data (converted
> internally).

I think the talk about the endian-ness of the PHY is entirely
unhelpful and is probably adding to confusion. The endian-ness of the
PHY is *not* exposed to the host because the MDIO interface to the PHY
is defined in terms of 16-bit register quantities, and bit 0 of the
register will be bit 0 on the host irrespective of host endian.
Christian Marangi Nov. 22, 2023, 7:55 p.m. UTC | #6
On Wed, Nov 22, 2023 at 06:53:50PM +0000, Russell King (Oracle) wrote:
> On Wed, Nov 22, 2023 at 06:53:39PM +0100, Christian Marangi wrote:
> > On Wed, Nov 22, 2023 at 05:24:33PM +0000, Russell King (Oracle) wrote:
> > > On Wed, Nov 22, 2023 at 06:08:13PM +0100, Christian Marangi wrote:
> > > > On further testing on BE target with kernel test robot, it was notice
> > > > that the endianness conversion for addr and CRC in fw_load_memory was
> > > > wrong and actually not needed. Values in define doesn't get converted
> > > > and are passed as is and hardcoded values are already in what the PHY
> > > > require, that is LE.
> > > > 
> > > > Also drop the cpu_to_be32 for CRC calculation as it's wrong and use
> > > > _swab32 instead, the word is taked from firmware and is always LE, the
> > > 
> > >                                taken
> > > 
> > > > mailbox will emit a BE CRC hence the word needs to be always swapped and
> > > > the endianness of the host needs to be ignored.
> > > 
> > > I'm not convinced. If the firmware is a bytestream (as most "files" are)
> > > then for val = get_unaligned((u32 *)ptr), where ptr is an array of u8:
> > > 
> > > ptr[0]	ptr[1]	ptr[2]	ptr[3]	val on LE	val on BE
> > > 0x01	0x02	0x03	0x04	0x04030201	0x01020304
> > > 
> > > So, endianness matters here, and I think as Jakub already suggested, you
> > > need to use get_unaligned_le32().
> > >
> > 
> > So they DO get converted to the HOST endian on reading the firmware from
> > an nvmem cell or a filesystem?
> 
> I don't like "converted". It's *not* a conversion. It's a fundamental
> property of accessing memory using different sizes of access.
> 
> As I attempted to explain above, if you have a file, and byte 0
> contains 0xAA, byte 1 of the file contains 0xBB, byte 2 contains
> 0xCC, and byte 3 contains 0xDD, then if you read that file byte by
> byte, you will get 0xAA, then 0xBB, then 0xCC and then 0xDD.
> 
> If you map that file into memory, e.g. in userspace, using mmap(),
> or allocating memory and reading four bytes into memory, and access
> it using bytes, then at offset 0, you will find 0xAA, offset 1 will
> be 0xBB, etc.
> 
> The problems with endianness start when you move away from byte
> access.
> 
> If you use 16-bit accessors, then, a little endian machine is defined
> that a 16-bit load from memory will result in the first byte being put
> into the LSB of the 16-bit value, and the second byte will be put into
> the MSB of the 16-bit value. So that would be 0xBBAA. However, on a big
> endian machine, a 16-bit load will result in the first byte being put
> into the MSB of the 16-bit value, and the second byte will be put into
> the LSB of that value - meaning the 16-bit value will be 0xAABB.
> 
> The second 16-bit value uses the next two bytes, and the order at which
> these two bytes are placed into the 16-bit value reflects the same as
> the first two bytes. So LE will be 0xDDCC and BE would be 0xCCDD.
> 
> The same "swapping" happens with 32-bit, but of course instead of just
> two bytes, it covers four bytes. On LE, a 32-bit access will give
> 0xDDCCBBAA. On BE, that will be 0xAABBCCDD.
> 
> Again, this is not to do with any kind of "conversion" happening in
> software. It's a property of how the memory subsystem inside the CPU
> works.
> 
> > Again this is really dumping raw data from the read file directly to the
> > mailbox. Unless phy_write does some conversion internally, but in that
> > case how does it know what endian is the PHY internally?
> 
> phy_write() does *no* conversion. The MDIO bus defines that a 16-bit
> register value will be transferred, and the MDIO bus specifies that
> bit 15 will be sent first, followed by subsequent bits down to bit 0.
> 
> The access to the hardware to make this happen is required to ensure
> that the value passed to phy_write() and read using phy_read() will
> reflect this. So, if one does this:
> 
> 	val = phy_read(phydev, 0);
> 
> 	for (i = 15; i >= 0; i--)
> 		printk("%u", !!(val & BIT(i)));
> 
> 	printk("\n");
> 
> This will give you the stream of bits in the _order_ that they appeared
> on the MDIO bus when phy_read() accessed. Doing the same with a value
> to be written will produce the bits in the same value that they will
> be placed on the MDIO bus.
> 
> So, this means that if the BMCR contains 0x1234 in the PHY, phy_read()
> will return 0x1234. Passing 0x1234 into phy_write() will write 0x1234
> in that register. The host endian is entirely irrelevant here.
>

Thanks a lot for the clarification. And sorry for misusing the word
conversion.

> > > I would make this explicit:
> > > 
> > > 		u8 crc_data[4];
> > > 
> > > 		...
> > > 
> > > 		/* CRC is calculated using BE order */
> > > 		crc_data[0] = word >> 24;
> > > 		crc_data[1] = word >> 16;
> > > 		crc_data[2] = word >> 8;
> > > 		crc_data[3] = word;
> > > 
> > > 		crc = crc_ccitt_false(crc, crc_data, sizeof(crc_data));
> > > 
> > > which will be (a) completely unambiguous, and (b) completely
> > > independent of the host endianness.
> > 
> > But isn't this exactly what is done with ___constant_swab32 ?
> > __swab32 should not change if the HOST is BE or LE.
> 
> Let try again to make this clear. If one has this code:
> 
> 		u32 word = 0x01020304;
> 		u8 *ptr;
> 		int i;
> 
> 		ptr = (u8 *)&word;
> 
> 		for (i = 0; i < 4; i++)
> 			printk(" %02x", ptr[i]);
> 		printk("\n");
> 
> Then, on a:
> - LE machine, this will print " 04 03 02 01"
> - BE machine, this will print " 01 02 03 04"
> 
> Now, if you look at the definition of crc_ccitt_false(), it is
> defined to do:
> 
>         while (len--)
>                 crc = crc_ccitt_false_byte(crc, *buffer++);
> 
> So, on a LE machine, this will feed the above bytes in the order of
> 0x04, 0x03, 0x02, 0x01 in a LE machine, and 0x01, 0x02, 0x03, 0x04
> on a BE machine.
> 

So it's really a problem of setting u8 in word and the order they are
read in the system.

Tell me if I'm wrong.

The first get_unaligned has to be changed to get_unaligned_le32 based on
how the data are treated from passing from an u8 to u32.

For LE this doesn't matter but for BE they needs to be swapped as this
is what mailbox expect.

For CRC. Would something like this work?

Define u8 crc_data[4];

*crc_data = (__force u32)cpu_to_be32(word);

crc = crc_ccitt_false(crc, crc_data, sizeof(word));

Using u8 array should keep the correct order no matter the endian and
cpu_to_be32 should correctly swap the word if needed. (in a BE HOST data
should already be in the right order and in LE has to be swapped right?)

> > The real question is if word is converted. (by either the read API on
> > reading the FW or by phy_write on writing the thing to mailbox) (the
> > test are done on a LE HOST)
> 
> There are no conversions - where a conversion I define as something
> that the software explicitly has to do rather than what the underlying
> machine hardware does.
> 
> > Our theory is that mailbox takes LE and internally converts to BE (as
> > the PHY is BE) but the CRC reg calculates the CRC out of the converted
> > data aka it does calculates the CRC from the BE data (converted
> > internally).
> 
> I think the talk about the endian-ness of the PHY is entirely
> unhelpful and is probably adding to confusion. The endian-ness of the
> PHY is *not* exposed to the host because the MDIO interface to the PHY
> is defined in terms of 16-bit register quantities, and bit 0 of the
> register will be bit 0 on the host irrespective of host endian.
>
Russell King (Oracle) Nov. 22, 2023, 8:25 p.m. UTC | #7
On Wed, Nov 22, 2023 at 08:55:17PM +0100, Christian Marangi wrote:
> On Wed, Nov 22, 2023 at 06:53:50PM +0000, Russell King (Oracle) wrote:
> > On Wed, Nov 22, 2023 at 06:53:39PM +0100, Christian Marangi wrote:
> > > On Wed, Nov 22, 2023 at 05:24:33PM +0000, Russell King (Oracle) wrote:
> > > > On Wed, Nov 22, 2023 at 06:08:13PM +0100, Christian Marangi wrote:
> > > > > On further testing on BE target with kernel test robot, it was notice
> > > > > that the endianness conversion for addr and CRC in fw_load_memory was
> > > > > wrong and actually not needed. Values in define doesn't get converted
> > > > > and are passed as is and hardcoded values are already in what the PHY
> > > > > require, that is LE.
> > > > > 
> > > > > Also drop the cpu_to_be32 for CRC calculation as it's wrong and use
> > > > > _swab32 instead, the word is taked from firmware and is always LE, the
> > > > 
> > > >                                taken
> > > > 
> > > > > mailbox will emit a BE CRC hence the word needs to be always swapped and
> > > > > the endianness of the host needs to be ignored.
> > > > 
> > > > I'm not convinced. If the firmware is a bytestream (as most "files" are)
> > > > then for val = get_unaligned((u32 *)ptr), where ptr is an array of u8:
> > > > 
> > > > ptr[0]	ptr[1]	ptr[2]	ptr[3]	val on LE	val on BE
> > > > 0x01	0x02	0x03	0x04	0x04030201	0x01020304
> > > > 
> > > > So, endianness matters here, and I think as Jakub already suggested, you
> > > > need to use get_unaligned_le32().
> > > >
> > > 
> > > So they DO get converted to the HOST endian on reading the firmware from
> > > an nvmem cell or a filesystem?
> > 
> > I don't like "converted". It's *not* a conversion. It's a fundamental
> > property of accessing memory using different sizes of access.
> > 
> > As I attempted to explain above, if you have a file, and byte 0
> > contains 0xAA, byte 1 of the file contains 0xBB, byte 2 contains
> > 0xCC, and byte 3 contains 0xDD, then if you read that file byte by
> > byte, you will get 0xAA, then 0xBB, then 0xCC and then 0xDD.
> > 
> > If you map that file into memory, e.g. in userspace, using mmap(),
> > or allocating memory and reading four bytes into memory, and access
> > it using bytes, then at offset 0, you will find 0xAA, offset 1 will
> > be 0xBB, etc.
> > 
> > The problems with endianness start when you move away from byte
> > access.
> > 
> > If you use 16-bit accessors, then, a little endian machine is defined
> > that a 16-bit load from memory will result in the first byte being put
> > into the LSB of the 16-bit value, and the second byte will be put into
> > the MSB of the 16-bit value. So that would be 0xBBAA. However, on a big
> > endian machine, a 16-bit load will result in the first byte being put
> > into the MSB of the 16-bit value, and the second byte will be put into
> > the LSB of that value - meaning the 16-bit value will be 0xAABB.
> > 
> > The second 16-bit value uses the next two bytes, and the order at which
> > these two bytes are placed into the 16-bit value reflects the same as
> > the first two bytes. So LE will be 0xDDCC and BE would be 0xCCDD.
> > 
> > The same "swapping" happens with 32-bit, but of course instead of just
> > two bytes, it covers four bytes. On LE, a 32-bit access will give
> > 0xDDCCBBAA. On BE, that will be 0xAABBCCDD.
> > 
> > Again, this is not to do with any kind of "conversion" happening in
> > software. It's a property of how the memory subsystem inside the CPU
> > works.
> > 
> > > Again this is really dumping raw data from the read file directly to the
> > > mailbox. Unless phy_write does some conversion internally, but in that
> > > case how does it know what endian is the PHY internally?
> > 
> > phy_write() does *no* conversion. The MDIO bus defines that a 16-bit
> > register value will be transferred, and the MDIO bus specifies that
> > bit 15 will be sent first, followed by subsequent bits down to bit 0.
> > 
> > The access to the hardware to make this happen is required to ensure
> > that the value passed to phy_write() and read using phy_read() will
> > reflect this. So, if one does this:
> > 
> > 	val = phy_read(phydev, 0);
> > 
> > 	for (i = 15; i >= 0; i--)
> > 		printk("%u", !!(val & BIT(i)));
> > 
> > 	printk("\n");
> > 
> > This will give you the stream of bits in the _order_ that they appeared
> > on the MDIO bus when phy_read() accessed. Doing the same with a value
> > to be written will produce the bits in the same value that they will
> > be placed on the MDIO bus.
> > 
> > So, this means that if the BMCR contains 0x1234 in the PHY, phy_read()
> > will return 0x1234. Passing 0x1234 into phy_write() will write 0x1234
> > in that register. The host endian is entirely irrelevant here.
> >
> 
> Thanks a lot for the clarification. And sorry for misusing the word
> conversion.
> 
> > > > I would make this explicit:
> > > > 
> > > > 		u8 crc_data[4];
> > > > 
> > > > 		...
> > > > 
> > > > 		/* CRC is calculated using BE order */
> > > > 		crc_data[0] = word >> 24;
> > > > 		crc_data[1] = word >> 16;
> > > > 		crc_data[2] = word >> 8;
> > > > 		crc_data[3] = word;
> > > > 
> > > > 		crc = crc_ccitt_false(crc, crc_data, sizeof(crc_data));
> > > > 
> > > > which will be (a) completely unambiguous, and (b) completely
> > > > independent of the host endianness.
> > > 
> > > But isn't this exactly what is done with ___constant_swab32 ?
> > > __swab32 should not change if the HOST is BE or LE.
> > 
> > Let try again to make this clear. If one has this code:
> > 
> > 		u32 word = 0x01020304;
> > 		u8 *ptr;
> > 		int i;
> > 
> > 		ptr = (u8 *)&word;
> > 
> > 		for (i = 0; i < 4; i++)
> > 			printk(" %02x", ptr[i]);
> > 		printk("\n");
> > 
> > Then, on a:
> > - LE machine, this will print " 04 03 02 01"
> > - BE machine, this will print " 01 02 03 04"
> > 
> > Now, if you look at the definition of crc_ccitt_false(), it is
> > defined to do:
> > 
> >         while (len--)
> >                 crc = crc_ccitt_false_byte(crc, *buffer++);
> > 
> > So, on a LE machine, this will feed the above bytes in the order of
> > 0x04, 0x03, 0x02, 0x01 in a LE machine, and 0x01, 0x02, 0x03, 0x04
> > on a BE machine.
> > 
> 
> So it's really a problem of setting u8 in word and the order they are
> read in the system.

Correct.

> The first get_unaligned has to be changed to get_unaligned_le32 based on
> how the data are treated from passing from an u8 to u32.

Yes.

I'm going to use the term "bytestream", abbreviated to just stream, to
represent the firmware that you are going to upload, because that's
essentially what all files are.

 the first byte of the stream to appear in bits 7:0 of
   VEND1_GLOBAL_MAILBOX_INTERFACE6

 the second byte of the stream to appear in bits 15:8 of
   VEND1_GLOBAL_MAILBOX_INTERFACE6

 the third byte of the stream to appear in bits 7:0 of
   VEND1_GLOBAL_MAILBOX_INTERFACE5

 the forth byte of the stream to appear in bits 15:8 of
   VEND1_GLOBAL_MAILBOX_INTERFACE5

and this to repeat over subsequent groups of four bytes in the stream.

This will be achieved by reading the stream using 32-bit little endian
accesses using get_unaligned_le32(), and then as you are already doing,
splitting them up into two 16-bit quantities.

> For LE this doesn't matter but for BE they needs to be swapped as this
> is what mailbox expect.

Correct.

> For CRC. Would something like this work?
> 
> Define u8 crc_data[4];
> 
> *crc_data = (__force u32)cpu_to_be32(word);

That won't do what you want, it will only write the first byte.

> crc = crc_ccitt_false(crc, crc_data, sizeof(word));

The point of explicitly assigning each byte is to ensure that it's
obvious that we'll get the right result. If we try to write a 32-bit
value, then we're getting right back into the "how does _this_ CPU
map a 32-bit value to indivudual bytes" endianness problem.

The advantage of writing it out as bytes into a u8 array is that from
a code readability point of view, it's all laid out in plain sight
exactly which part of the 32-bit value ends up where and the order in
which the crc function is going to read those bytes - and it is
independent of whatever the endianess of the host architecture.

> Using u8 array should keep the correct order no matter the endian and
> cpu_to_be32 should correctly swap the word if needed. (in a BE HOST data
> should already be in the right order and in LE has to be swapped right?)

If you are absolutely certain that each group of four bytes in the
source bytestream need to be provided to the CRC function in the
reverse order to which they appear in the file.
Christian Marangi Nov. 22, 2023, 9:09 p.m. UTC | #8
On Wed, Nov 22, 2023 at 08:25:16PM +0000, Russell King (Oracle) wrote:
> On Wed, Nov 22, 2023 at 08:55:17PM +0100, Christian Marangi wrote:
> > On Wed, Nov 22, 2023 at 06:53:50PM +0000, Russell King (Oracle) wrote:
> > > On Wed, Nov 22, 2023 at 06:53:39PM +0100, Christian Marangi wrote:
> > > > On Wed, Nov 22, 2023 at 05:24:33PM +0000, Russell King (Oracle) wrote:
> > > > > On Wed, Nov 22, 2023 at 06:08:13PM +0100, Christian Marangi wrote:
> > > > > > On further testing on BE target with kernel test robot, it was notice
> > > > > > that the endianness conversion for addr and CRC in fw_load_memory was
> > > > > > wrong and actually not needed. Values in define doesn't get converted
> > > > > > and are passed as is and hardcoded values are already in what the PHY
> > > > > > require, that is LE.
> > > > > > 
> > > > > > Also drop the cpu_to_be32 for CRC calculation as it's wrong and use
> > > > > > _swab32 instead, the word is taked from firmware and is always LE, the
> > > > > 
> > > > >                                taken
> > > > > 
> > > > > > mailbox will emit a BE CRC hence the word needs to be always swapped and
> > > > > > the endianness of the host needs to be ignored.
> > > > > 
> > > > > I'm not convinced. If the firmware is a bytestream (as most "files" are)
> > > > > then for val = get_unaligned((u32 *)ptr), where ptr is an array of u8:
> > > > > 
> > > > > ptr[0]	ptr[1]	ptr[2]	ptr[3]	val on LE	val on BE
> > > > > 0x01	0x02	0x03	0x04	0x04030201	0x01020304
> > > > > 
> > > > > So, endianness matters here, and I think as Jakub already suggested, you
> > > > > need to use get_unaligned_le32().
> > > > >
> > > > 
> > > > So they DO get converted to the HOST endian on reading the firmware from
> > > > an nvmem cell or a filesystem?
> > > 
> > > I don't like "converted". It's *not* a conversion. It's a fundamental
> > > property of accessing memory using different sizes of access.
> > > 
> > > As I attempted to explain above, if you have a file, and byte 0
> > > contains 0xAA, byte 1 of the file contains 0xBB, byte 2 contains
> > > 0xCC, and byte 3 contains 0xDD, then if you read that file byte by
> > > byte, you will get 0xAA, then 0xBB, then 0xCC and then 0xDD.
> > > 
> > > If you map that file into memory, e.g. in userspace, using mmap(),
> > > or allocating memory and reading four bytes into memory, and access
> > > it using bytes, then at offset 0, you will find 0xAA, offset 1 will
> > > be 0xBB, etc.
> > > 
> > > The problems with endianness start when you move away from byte
> > > access.
> > > 
> > > If you use 16-bit accessors, then, a little endian machine is defined
> > > that a 16-bit load from memory will result in the first byte being put
> > > into the LSB of the 16-bit value, and the second byte will be put into
> > > the MSB of the 16-bit value. So that would be 0xBBAA. However, on a big
> > > endian machine, a 16-bit load will result in the first byte being put
> > > into the MSB of the 16-bit value, and the second byte will be put into
> > > the LSB of that value - meaning the 16-bit value will be 0xAABB.
> > > 
> > > The second 16-bit value uses the next two bytes, and the order at which
> > > these two bytes are placed into the 16-bit value reflects the same as
> > > the first two bytes. So LE will be 0xDDCC and BE would be 0xCCDD.
> > > 
> > > The same "swapping" happens with 32-bit, but of course instead of just
> > > two bytes, it covers four bytes. On LE, a 32-bit access will give
> > > 0xDDCCBBAA. On BE, that will be 0xAABBCCDD.
> > > 
> > > Again, this is not to do with any kind of "conversion" happening in
> > > software. It's a property of how the memory subsystem inside the CPU
> > > works.
> > > 
> > > > Again this is really dumping raw data from the read file directly to the
> > > > mailbox. Unless phy_write does some conversion internally, but in that
> > > > case how does it know what endian is the PHY internally?
> > > 
> > > phy_write() does *no* conversion. The MDIO bus defines that a 16-bit
> > > register value will be transferred, and the MDIO bus specifies that
> > > bit 15 will be sent first, followed by subsequent bits down to bit 0.
> > > 
> > > The access to the hardware to make this happen is required to ensure
> > > that the value passed to phy_write() and read using phy_read() will
> > > reflect this. So, if one does this:
> > > 
> > > 	val = phy_read(phydev, 0);
> > > 
> > > 	for (i = 15; i >= 0; i--)
> > > 		printk("%u", !!(val & BIT(i)));
> > > 
> > > 	printk("\n");
> > > 
> > > This will give you the stream of bits in the _order_ that they appeared
> > > on the MDIO bus when phy_read() accessed. Doing the same with a value
> > > to be written will produce the bits in the same value that they will
> > > be placed on the MDIO bus.
> > > 
> > > So, this means that if the BMCR contains 0x1234 in the PHY, phy_read()
> > > will return 0x1234. Passing 0x1234 into phy_write() will write 0x1234
> > > in that register. The host endian is entirely irrelevant here.
> > >
> > 
> > Thanks a lot for the clarification. And sorry for misusing the word
> > conversion.
> > 
> > > > > I would make this explicit:
> > > > > 
> > > > > 		u8 crc_data[4];
> > > > > 
> > > > > 		...
> > > > > 
> > > > > 		/* CRC is calculated using BE order */
> > > > > 		crc_data[0] = word >> 24;
> > > > > 		crc_data[1] = word >> 16;
> > > > > 		crc_data[2] = word >> 8;
> > > > > 		crc_data[3] = word;
> > > > > 
> > > > > 		crc = crc_ccitt_false(crc, crc_data, sizeof(crc_data));
> > > > > 
> > > > > which will be (a) completely unambiguous, and (b) completely
> > > > > independent of the host endianness.
> > > > 
> > > > But isn't this exactly what is done with ___constant_swab32 ?
> > > > __swab32 should not change if the HOST is BE or LE.
> > > 
> > > Let try again to make this clear. If one has this code:
> > > 
> > > 		u32 word = 0x01020304;
> > > 		u8 *ptr;
> > > 		int i;
> > > 
> > > 		ptr = (u8 *)&word;
> > > 
> > > 		for (i = 0; i < 4; i++)
> > > 			printk(" %02x", ptr[i]);
> > > 		printk("\n");
> > > 
> > > Then, on a:
> > > - LE machine, this will print " 04 03 02 01"
> > > - BE machine, this will print " 01 02 03 04"
> > > 
> > > Now, if you look at the definition of crc_ccitt_false(), it is
> > > defined to do:
> > > 
> > >         while (len--)
> > >                 crc = crc_ccitt_false_byte(crc, *buffer++);
> > > 
> > > So, on a LE machine, this will feed the above bytes in the order of
> > > 0x04, 0x03, 0x02, 0x01 in a LE machine, and 0x01, 0x02, 0x03, 0x04
> > > on a BE machine.
> > > 
> > 
> > So it's really a problem of setting u8 in word and the order they are
> > read in the system.
> 
> Correct.
> 
> > The first get_unaligned has to be changed to get_unaligned_le32 based on
> > how the data are treated from passing from an u8 to u32.
> 
> Yes.
> 
> I'm going to use the term "bytestream", abbreviated to just stream, to
> represent the firmware that you are going to upload, because that's
> essentially what all files are.
> 
>  the first byte of the stream to appear in bits 7:0 of
>    VEND1_GLOBAL_MAILBOX_INTERFACE6
> 
>  the second byte of the stream to appear in bits 15:8 of
>    VEND1_GLOBAL_MAILBOX_INTERFACE6
> 
>  the third byte of the stream to appear in bits 7:0 of
>    VEND1_GLOBAL_MAILBOX_INTERFACE5
> 
>  the forth byte of the stream to appear in bits 15:8 of
>    VEND1_GLOBAL_MAILBOX_INTERFACE5
> 
> and this to repeat over subsequent groups of four bytes in the stream.
> 
> This will be achieved by reading the stream using 32-bit little endian
> accesses using get_unaligned_le32(), and then as you are already doing,
> splitting them up into two 16-bit quantities.
> 
> > For LE this doesn't matter but for BE they needs to be swapped as this
> > is what mailbox expect.
> 
> Correct.
> 
> > For CRC. Would something like this work?
> > 
> > Define u8 crc_data[4];
> > 
> > *crc_data = (__force u32)cpu_to_be32(word);
> 
> That won't do what you want, it will only write the first byte.
>

Right I'm stupid...

Just an example, the correct way would have been...

u8 crc_data[4];
u32 *crc_word;
u32 word;

crc_word = (u32 *)crc_data;
*crc_word = (__force u32)cpu_to_be32(word);

crc = crc_ccitt_false(crc, crc_data, sizeof(word));

> > crc = crc_ccitt_false(crc, crc_data, sizeof(word));
> 
> The point of explicitly assigning each byte is to ensure that it's
> obvious that we'll get the right result. If we try to write a 32-bit
> value, then we're getting right back into the "how does _this_ CPU
> map a 32-bit value to indivudual bytes" endianness problem.
> 
> The advantage of writing it out as bytes into a u8 array is that from
> a code readability point of view, it's all laid out in plain sight
> exactly which part of the 32-bit value ends up where and the order in
> which the crc function is going to read those bytes - and it is
> independent of whatever the endianess of the host architecture.
> 

It's just that I would really love to have a way to skip the shift
maddness. If the code above is correct really don't know what is
better... probably yours... just I don't like those shift.

I wonder... can't I reuse get_unaligned_be32((const u32 *)(data + pos));
???

> > Using u8 array should keep the correct order no matter the endian and
> > cpu_to_be32 should correctly swap the word if needed. (in a BE HOST data
> > should already be in the right order and in LE has to be swapped right?)
> 
> If you are absolutely certain that each group of four bytes in the
> source bytestream need to be provided to the CRC function in the
> reverse order to which they appear in the file.
>
Russell King (Oracle) Nov. 22, 2023, 10:31 p.m. UTC | #9
On Wed, Nov 22, 2023 at 10:09:54PM +0100, Christian Marangi wrote:
> On Wed, Nov 22, 2023 at 08:25:16PM +0000, Russell King (Oracle) wrote:
> > On Wed, Nov 22, 2023 at 08:55:17PM +0100, Christian Marangi wrote:
> > > On Wed, Nov 22, 2023 at 06:53:50PM +0000, Russell King (Oracle) wrote:
> > > > On Wed, Nov 22, 2023 at 06:53:39PM +0100, Christian Marangi wrote:
> > > > > On Wed, Nov 22, 2023 at 05:24:33PM +0000, Russell King (Oracle) wrote:
> > > > > > On Wed, Nov 22, 2023 at 06:08:13PM +0100, Christian Marangi wrote:
> > > > > > > On further testing on BE target with kernel test robot, it was notice
> > > > > > > that the endianness conversion for addr and CRC in fw_load_memory was
> > > > > > > wrong and actually not needed. Values in define doesn't get converted
> > > > > > > and are passed as is and hardcoded values are already in what the PHY
> > > > > > > require, that is LE.
> > > > > > > 
> > > > > > > Also drop the cpu_to_be32 for CRC calculation as it's wrong and use
> > > > > > > _swab32 instead, the word is taked from firmware and is always LE, the
> > > > > > 
> > > > > >                                taken
> > > > > > 
> > > > > > > mailbox will emit a BE CRC hence the word needs to be always swapped and
> > > > > > > the endianness of the host needs to be ignored.
> > > > > > 
> > > > > > I'm not convinced. If the firmware is a bytestream (as most "files" are)
> > > > > > then for val = get_unaligned((u32 *)ptr), where ptr is an array of u8:
> > > > > > 
> > > > > > ptr[0]	ptr[1]	ptr[2]	ptr[3]	val on LE	val on BE
> > > > > > 0x01	0x02	0x03	0x04	0x04030201	0x01020304
> > > > > > 
> > > > > > So, endianness matters here, and I think as Jakub already suggested, you
> > > > > > need to use get_unaligned_le32().
> > > > > >
> > > > > 
> > > > > So they DO get converted to the HOST endian on reading the firmware from
> > > > > an nvmem cell or a filesystem?
> > > > 
> > > > I don't like "converted". It's *not* a conversion. It's a fundamental
> > > > property of accessing memory using different sizes of access.
> > > > 
> > > > As I attempted to explain above, if you have a file, and byte 0
> > > > contains 0xAA, byte 1 of the file contains 0xBB, byte 2 contains
> > > > 0xCC, and byte 3 contains 0xDD, then if you read that file byte by
> > > > byte, you will get 0xAA, then 0xBB, then 0xCC and then 0xDD.
> > > > 
> > > > If you map that file into memory, e.g. in userspace, using mmap(),
> > > > or allocating memory and reading four bytes into memory, and access
> > > > it using bytes, then at offset 0, you will find 0xAA, offset 1 will
> > > > be 0xBB, etc.
> > > > 
> > > > The problems with endianness start when you move away from byte
> > > > access.
> > > > 
> > > > If you use 16-bit accessors, then, a little endian machine is defined
> > > > that a 16-bit load from memory will result in the first byte being put
> > > > into the LSB of the 16-bit value, and the second byte will be put into
> > > > the MSB of the 16-bit value. So that would be 0xBBAA. However, on a big
> > > > endian machine, a 16-bit load will result in the first byte being put
> > > > into the MSB of the 16-bit value, and the second byte will be put into
> > > > the LSB of that value - meaning the 16-bit value will be 0xAABB.
> > > > 
> > > > The second 16-bit value uses the next two bytes, and the order at which
> > > > these two bytes are placed into the 16-bit value reflects the same as
> > > > the first two bytes. So LE will be 0xDDCC and BE would be 0xCCDD.
> > > > 
> > > > The same "swapping" happens with 32-bit, but of course instead of just
> > > > two bytes, it covers four bytes. On LE, a 32-bit access will give
> > > > 0xDDCCBBAA. On BE, that will be 0xAABBCCDD.
> > > > 
> > > > Again, this is not to do with any kind of "conversion" happening in
> > > > software. It's a property of how the memory subsystem inside the CPU
> > > > works.
> > > > 
> > > > > Again this is really dumping raw data from the read file directly to the
> > > > > mailbox. Unless phy_write does some conversion internally, but in that
> > > > > case how does it know what endian is the PHY internally?
> > > > 
> > > > phy_write() does *no* conversion. The MDIO bus defines that a 16-bit
> > > > register value will be transferred, and the MDIO bus specifies that
> > > > bit 15 will be sent first, followed by subsequent bits down to bit 0.
> > > > 
> > > > The access to the hardware to make this happen is required to ensure
> > > > that the value passed to phy_write() and read using phy_read() will
> > > > reflect this. So, if one does this:
> > > > 
> > > > 	val = phy_read(phydev, 0);
> > > > 
> > > > 	for (i = 15; i >= 0; i--)
> > > > 		printk("%u", !!(val & BIT(i)));
> > > > 
> > > > 	printk("\n");
> > > > 
> > > > This will give you the stream of bits in the _order_ that they appeared
> > > > on the MDIO bus when phy_read() accessed. Doing the same with a value
> > > > to be written will produce the bits in the same value that they will
> > > > be placed on the MDIO bus.
> > > > 
> > > > So, this means that if the BMCR contains 0x1234 in the PHY, phy_read()
> > > > will return 0x1234. Passing 0x1234 into phy_write() will write 0x1234
> > > > in that register. The host endian is entirely irrelevant here.
> > > >
> > > 
> > > Thanks a lot for the clarification. And sorry for misusing the word
> > > conversion.
> > > 
> > > > > > I would make this explicit:
> > > > > > 
> > > > > > 		u8 crc_data[4];
> > > > > > 
> > > > > > 		...
> > > > > > 
> > > > > > 		/* CRC is calculated using BE order */
> > > > > > 		crc_data[0] = word >> 24;
> > > > > > 		crc_data[1] = word >> 16;
> > > > > > 		crc_data[2] = word >> 8;
> > > > > > 		crc_data[3] = word;
> > > > > > 
> > > > > > 		crc = crc_ccitt_false(crc, crc_data, sizeof(crc_data));
> > > > > > 
> > > > > > which will be (a) completely unambiguous, and (b) completely
> > > > > > independent of the host endianness.
> > > > > 
> > > > > But isn't this exactly what is done with ___constant_swab32 ?
> > > > > __swab32 should not change if the HOST is BE or LE.
> > > > 
> > > > Let try again to make this clear. If one has this code:
> > > > 
> > > > 		u32 word = 0x01020304;
> > > > 		u8 *ptr;
> > > > 		int i;
> > > > 
> > > > 		ptr = (u8 *)&word;
> > > > 
> > > > 		for (i = 0; i < 4; i++)
> > > > 			printk(" %02x", ptr[i]);
> > > > 		printk("\n");
> > > > 
> > > > Then, on a:
> > > > - LE machine, this will print " 04 03 02 01"
> > > > - BE machine, this will print " 01 02 03 04"
> > > > 
> > > > Now, if you look at the definition of crc_ccitt_false(), it is
> > > > defined to do:
> > > > 
> > > >         while (len--)
> > > >                 crc = crc_ccitt_false_byte(crc, *buffer++);
> > > > 
> > > > So, on a LE machine, this will feed the above bytes in the order of
> > > > 0x04, 0x03, 0x02, 0x01 in a LE machine, and 0x01, 0x02, 0x03, 0x04
> > > > on a BE machine.
> > > > 
> > > 
> > > So it's really a problem of setting u8 in word and the order they are
> > > read in the system.
> > 
> > Correct.
> > 
> > > The first get_unaligned has to be changed to get_unaligned_le32 based on
> > > how the data are treated from passing from an u8 to u32.
> > 
> > Yes.
> > 
> > I'm going to use the term "bytestream", abbreviated to just stream, to
> > represent the firmware that you are going to upload, because that's
> > essentially what all files are.
> > 
> >  the first byte of the stream to appear in bits 7:0 of
> >    VEND1_GLOBAL_MAILBOX_INTERFACE6
> > 
> >  the second byte of the stream to appear in bits 15:8 of
> >    VEND1_GLOBAL_MAILBOX_INTERFACE6
> > 
> >  the third byte of the stream to appear in bits 7:0 of
> >    VEND1_GLOBAL_MAILBOX_INTERFACE5
> > 
> >  the forth byte of the stream to appear in bits 15:8 of
> >    VEND1_GLOBAL_MAILBOX_INTERFACE5
> > 
> > and this to repeat over subsequent groups of four bytes in the stream.
> > 
> > This will be achieved by reading the stream using 32-bit little endian
> > accesses using get_unaligned_le32(), and then as you are already doing,
> > splitting them up into two 16-bit quantities.
> > 
> > > For LE this doesn't matter but for BE they needs to be swapped as this
> > > is what mailbox expect.
> > 
> > Correct.
> > 
> > > For CRC. Would something like this work?
> > > 
> > > Define u8 crc_data[4];
> > > 
> > > *crc_data = (__force u32)cpu_to_be32(word);
> > 
> > That won't do what you want, it will only write the first byte.
> >
> 
> Right I'm stupid...
> 
> Just an example, the correct way would have been...
> 
> u8 crc_data[4];
> u32 *crc_word;
> u32 word;
> 
> crc_word = (u32 *)crc_data;
> *crc_word = (__force u32)cpu_to_be32(word);

So, let's say "word" was originally 0x12345678.

cpu_to_be32() on LE will make this 0x78563412, and that will be stored
in the array as 0x12, 0x34, 0x56, 0x78.

cpu_to_be32() on BE will be a no-op, so this will remain as 0x12345678,
which will be stored in the array as 0x12, 0x34, 0x56, 0x78.

So yes, that looks like it will work, but there's another issue to
consider - whether this is legal C, or whether it is venturing into
undefined behaviour.

https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8

> > If you are absolutely certain that each group of four bytes in the
> > source bytestream need to be provided to the CRC function in the
> > reverse order to which they appear in the file.

This is a point to which you unfortunately didn't reply, but is an
important one.

If the bytes in the bytestream are in the correct order for calculating
the CRC, then we can avoid all this and just call the CRC function over
the entire image without needing to do the word load and dance about
changing the order of the bytes - and this was why I mentioned this
hoping to prompt you to double-check.
Christian Marangi Nov. 22, 2023, 10:37 p.m. UTC | #10
On Wed, Nov 22, 2023 at 10:31:50PM +0000, Russell King (Oracle) wrote:
> On Wed, Nov 22, 2023 at 10:09:54PM +0100, Christian Marangi wrote:
> > On Wed, Nov 22, 2023 at 08:25:16PM +0000, Russell King (Oracle) wrote:
> > > On Wed, Nov 22, 2023 at 08:55:17PM +0100, Christian Marangi wrote:
> > > > On Wed, Nov 22, 2023 at 06:53:50PM +0000, Russell King (Oracle) wrote:
> > > > > On Wed, Nov 22, 2023 at 06:53:39PM +0100, Christian Marangi wrote:
> > > > > > On Wed, Nov 22, 2023 at 05:24:33PM +0000, Russell King (Oracle) wrote:
> > > > > > > On Wed, Nov 22, 2023 at 06:08:13PM +0100, Christian Marangi wrote:
> > > > > > > > On further testing on BE target with kernel test robot, it was notice
> > > > > > > > that the endianness conversion for addr and CRC in fw_load_memory was
> > > > > > > > wrong and actually not needed. Values in define doesn't get converted
> > > > > > > > and are passed as is and hardcoded values are already in what the PHY
> > > > > > > > require, that is LE.
> > > > > > > > 
> > > > > > > > Also drop the cpu_to_be32 for CRC calculation as it's wrong and use
> > > > > > > > _swab32 instead, the word is taked from firmware and is always LE, the
> > > > > > > 
> > > > > > >                                taken
> > > > > > > 
> > > > > > > > mailbox will emit a BE CRC hence the word needs to be always swapped and
> > > > > > > > the endianness of the host needs to be ignored.
> > > > > > > 
> > > > > > > I'm not convinced. If the firmware is a bytestream (as most "files" are)
> > > > > > > then for val = get_unaligned((u32 *)ptr), where ptr is an array of u8:
> > > > > > > 
> > > > > > > ptr[0]	ptr[1]	ptr[2]	ptr[3]	val on LE	val on BE
> > > > > > > 0x01	0x02	0x03	0x04	0x04030201	0x01020304
> > > > > > > 
> > > > > > > So, endianness matters here, and I think as Jakub already suggested, you
> > > > > > > need to use get_unaligned_le32().
> > > > > > >
> > > > > > 
> > > > > > So they DO get converted to the HOST endian on reading the firmware from
> > > > > > an nvmem cell or a filesystem?
> > > > > 
> > > > > I don't like "converted". It's *not* a conversion. It's a fundamental
> > > > > property of accessing memory using different sizes of access.
> > > > > 
> > > > > As I attempted to explain above, if you have a file, and byte 0
> > > > > contains 0xAA, byte 1 of the file contains 0xBB, byte 2 contains
> > > > > 0xCC, and byte 3 contains 0xDD, then if you read that file byte by
> > > > > byte, you will get 0xAA, then 0xBB, then 0xCC and then 0xDD.
> > > > > 
> > > > > If you map that file into memory, e.g. in userspace, using mmap(),
> > > > > or allocating memory and reading four bytes into memory, and access
> > > > > it using bytes, then at offset 0, you will find 0xAA, offset 1 will
> > > > > be 0xBB, etc.
> > > > > 
> > > > > The problems with endianness start when you move away from byte
> > > > > access.
> > > > > 
> > > > > If you use 16-bit accessors, then, a little endian machine is defined
> > > > > that a 16-bit load from memory will result in the first byte being put
> > > > > into the LSB of the 16-bit value, and the second byte will be put into
> > > > > the MSB of the 16-bit value. So that would be 0xBBAA. However, on a big
> > > > > endian machine, a 16-bit load will result in the first byte being put
> > > > > into the MSB of the 16-bit value, and the second byte will be put into
> > > > > the LSB of that value - meaning the 16-bit value will be 0xAABB.
> > > > > 
> > > > > The second 16-bit value uses the next two bytes, and the order at which
> > > > > these two bytes are placed into the 16-bit value reflects the same as
> > > > > the first two bytes. So LE will be 0xDDCC and BE would be 0xCCDD.
> > > > > 
> > > > > The same "swapping" happens with 32-bit, but of course instead of just
> > > > > two bytes, it covers four bytes. On LE, a 32-bit access will give
> > > > > 0xDDCCBBAA. On BE, that will be 0xAABBCCDD.
> > > > > 
> > > > > Again, this is not to do with any kind of "conversion" happening in
> > > > > software. It's a property of how the memory subsystem inside the CPU
> > > > > works.
> > > > > 
> > > > > > Again this is really dumping raw data from the read file directly to the
> > > > > > mailbox. Unless phy_write does some conversion internally, but in that
> > > > > > case how does it know what endian is the PHY internally?
> > > > > 
> > > > > phy_write() does *no* conversion. The MDIO bus defines that a 16-bit
> > > > > register value will be transferred, and the MDIO bus specifies that
> > > > > bit 15 will be sent first, followed by subsequent bits down to bit 0.
> > > > > 
> > > > > The access to the hardware to make this happen is required to ensure
> > > > > that the value passed to phy_write() and read using phy_read() will
> > > > > reflect this. So, if one does this:
> > > > > 
> > > > > 	val = phy_read(phydev, 0);
> > > > > 
> > > > > 	for (i = 15; i >= 0; i--)
> > > > > 		printk("%u", !!(val & BIT(i)));
> > > > > 
> > > > > 	printk("\n");
> > > > > 
> > > > > This will give you the stream of bits in the _order_ that they appeared
> > > > > on the MDIO bus when phy_read() accessed. Doing the same with a value
> > > > > to be written will produce the bits in the same value that they will
> > > > > be placed on the MDIO bus.
> > > > > 
> > > > > So, this means that if the BMCR contains 0x1234 in the PHY, phy_read()
> > > > > will return 0x1234. Passing 0x1234 into phy_write() will write 0x1234
> > > > > in that register. The host endian is entirely irrelevant here.
> > > > >
> > > > 
> > > > Thanks a lot for the clarification. And sorry for misusing the word
> > > > conversion.
> > > > 
> > > > > > > I would make this explicit:
> > > > > > > 
> > > > > > > 		u8 crc_data[4];
> > > > > > > 
> > > > > > > 		...
> > > > > > > 
> > > > > > > 		/* CRC is calculated using BE order */
> > > > > > > 		crc_data[0] = word >> 24;
> > > > > > > 		crc_data[1] = word >> 16;
> > > > > > > 		crc_data[2] = word >> 8;
> > > > > > > 		crc_data[3] = word;
> > > > > > > 
> > > > > > > 		crc = crc_ccitt_false(crc, crc_data, sizeof(crc_data));
> > > > > > > 
> > > > > > > which will be (a) completely unambiguous, and (b) completely
> > > > > > > independent of the host endianness.
> > > > > > 
> > > > > > But isn't this exactly what is done with ___constant_swab32 ?
> > > > > > __swab32 should not change if the HOST is BE or LE.
> > > > > 
> > > > > Let try again to make this clear. If one has this code:
> > > > > 
> > > > > 		u32 word = 0x01020304;
> > > > > 		u8 *ptr;
> > > > > 		int i;
> > > > > 
> > > > > 		ptr = (u8 *)&word;
> > > > > 
> > > > > 		for (i = 0; i < 4; i++)
> > > > > 			printk(" %02x", ptr[i]);
> > > > > 		printk("\n");
> > > > > 
> > > > > Then, on a:
> > > > > - LE machine, this will print " 04 03 02 01"
> > > > > - BE machine, this will print " 01 02 03 04"
> > > > > 
> > > > > Now, if you look at the definition of crc_ccitt_false(), it is
> > > > > defined to do:
> > > > > 
> > > > >         while (len--)
> > > > >                 crc = crc_ccitt_false_byte(crc, *buffer++);
> > > > > 
> > > > > So, on a LE machine, this will feed the above bytes in the order of
> > > > > 0x04, 0x03, 0x02, 0x01 in a LE machine, and 0x01, 0x02, 0x03, 0x04
> > > > > on a BE machine.
> > > > > 
> > > > 
> > > > So it's really a problem of setting u8 in word and the order they are
> > > > read in the system.
> > > 
> > > Correct.
> > > 
> > > > The first get_unaligned has to be changed to get_unaligned_le32 based on
> > > > how the data are treated from passing from an u8 to u32.
> > > 
> > > Yes.
> > > 
> > > I'm going to use the term "bytestream", abbreviated to just stream, to
> > > represent the firmware that you are going to upload, because that's
> > > essentially what all files are.
> > > 
> > >  the first byte of the stream to appear in bits 7:0 of
> > >    VEND1_GLOBAL_MAILBOX_INTERFACE6
> > > 
> > >  the second byte of the stream to appear in bits 15:8 of
> > >    VEND1_GLOBAL_MAILBOX_INTERFACE6
> > > 
> > >  the third byte of the stream to appear in bits 7:0 of
> > >    VEND1_GLOBAL_MAILBOX_INTERFACE5
> > > 
> > >  the forth byte of the stream to appear in bits 15:8 of
> > >    VEND1_GLOBAL_MAILBOX_INTERFACE5
> > > 
> > > and this to repeat over subsequent groups of four bytes in the stream.
> > > 
> > > This will be achieved by reading the stream using 32-bit little endian
> > > accesses using get_unaligned_le32(), and then as you are already doing,
> > > splitting them up into two 16-bit quantities.
> > > 
> > > > For LE this doesn't matter but for BE they needs to be swapped as this
> > > > is what mailbox expect.
> > > 
> > > Correct.
> > > 
> > > > For CRC. Would something like this work?
> > > > 
> > > > Define u8 crc_data[4];
> > > > 
> > > > *crc_data = (__force u32)cpu_to_be32(word);
> > > 
> > > That won't do what you want, it will only write the first byte.
> > >
> > 
> > Right I'm stupid...
> > 
> > Just an example, the correct way would have been...
> > 
> > u8 crc_data[4];
> > u32 *crc_word;
> > u32 word;
> > 
> > crc_word = (u32 *)crc_data;
> > *crc_word = (__force u32)cpu_to_be32(word);
> 
> So, let's say "word" was originally 0x12345678.
> 
> cpu_to_be32() on LE will make this 0x78563412, and that will be stored
> in the array as 0x12, 0x34, 0x56, 0x78.
> 
> cpu_to_be32() on BE will be a no-op, so this will remain as 0x12345678,
> which will be stored in the array as 0x12, 0x34, 0x56, 0x78.
> 
> So yes, that looks like it will work, but there's another issue to
> consider - whether this is legal C, or whether it is venturing into
> undefined behaviour.
> 
> https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8
>

Yes I also have some doubt it's legal C... Did you miss the idea about
using get_unaligned_be32? Shouldn't that swap the bytes only on LE HOST
resulting in exactly what we want?

> > > If you are absolutely certain that each group of four bytes in the
> > > source bytestream need to be provided to the CRC function in the
> > > reverse order to which they appear in the file.
> 
> This is a point to which you unfortunately didn't reply, but is an
> important one.
> 
> If the bytes in the bytestream are in the correct order for calculating
> the CRC, then we can avoid all this and just call the CRC function over
> the entire image without needing to do the word load and dance about
> changing the order of the bytes - and this was why I mentioned this
> hoping to prompt you to double-check.
>

I have to double check.

The idea of calculating the crc as we read was to skip looping again.
(but since it's one only at probe it's not that valuable)

The fact that we would pass u8 to the crc function means that reading
it wouldn't change if host is BE or LE. Correct?
diff mbox series

Patch

diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c
index c5f292b1c4c8..bd093633d0cf 100644
--- a/drivers/net/phy/aquantia/aquantia_firmware.c
+++ b/drivers/net/phy/aquantia/aquantia_firmware.c
@@ -93,9 +93,9 @@  static int aqr_fw_load_memory(struct phy_device *phydev, u32 addr,
 	u16 crc = 0, up_crc;
 	size_t pos;
 
-	/* PHY expect addr in LE */
-	addr = (__force u32)cpu_to_le32(addr);
-
+	/* PHY expect addr in LE. Hardcoded addr in defines are
+	 * already in this format.
+	 */
 	phy_write_mmd(phydev, MDIO_MMD_VEND1,
 		      VEND1_GLOBAL_MAILBOX_INTERFACE1,
 		      VEND1_GLOBAL_MAILBOX_INTERFACE1_CRC_RESET);
@@ -128,7 +128,7 @@  static int aqr_fw_load_memory(struct phy_device *phydev, u32 addr,
 		 * We convert word to big-endian as PHY is BE and mailbox will
 		 * return a BE CRC.
 		 */
-		word = (__force u32)cpu_to_be32(word);
+		word = __swab32(word);
 		crc = crc_ccitt_false(crc, (u8 *)&word, sizeof(word));
 	}