[v2,0/6] spi-mem: Allow specifying the byte order in DTR mode

Message ID	20220311080147.453483-1-tudor.ambarus@microchip.com (mailing list archive)
Headers	show Return-Path: <linux-spi-owner@kernel.org> From: Tudor Ambarus <tudor.ambarus@microchip.com> To: <p.yadav@ti.com>, <michael@walle.cc>, <broonie@kernel.org> CC: <miquel.raynal@bootlin.com>, <richard@nod.at>, <vigneshr@ti.com>, <linux-mtd@lists.infradead.org>, <linux-kernel@vger.kernel.org>, <linux-spi@vger.kernel.org>, <nicolas.ferre@microchip.com>, Tudor Ambarus <tudor.ambarus@microchip.com> Subject: [PATCH v2 0/6] spi-mem: Allow specifying the byte order in DTR mode Date: Fri, 11 Mar 2022 10:01:41 +0200 Message-ID: <20220311080147.453483-1-tudor.ambarus@microchip.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Precedence: bulk
Series	spi-mem: Allow specifying the byte order in DTR mode \| expand [v2,0/6] spi-mem: Allow specifying the byte order in DTR mode [v2,1/6] spi: spi-mem: Allow specifying the byte order in DTR mode [v2,2/6] mtd: spi-nor: core: Allow specifying the byte order in DTR mode [v2,3/6] mtd: spi-nor: sfdp: Get the 8D-8D-8D byte order from BFPT [v2,4/6] mtd: spi-nor: core: Introduce SPI_NOR_DTR_BSWAP16 no_sfdp_flag [v2,5/6] mtd: spi-nor: core: Introduce SPI_NOR_SOFT_RESET flash_info fixup_flag [v2,6/6] mtd: spi-nor: macronix: Add support for mx66lm1g45g

Tudor Ambarus March 11, 2022, 8:01 a.m. UTC

There are NOR flashes (Macronix) that swap the bytes on a 16-bit boundary
when configured in Octal DTR mode. The byte order of 16-bit words is
swapped when read or written in Octal Double Transfer Rate (DTR) mode
compared to Single Transfer Rate (STR) modes. If one writes D0 D1 D2 D3
bytes using 1-1-1 mode, and uses 8D-8D-8D SPI mode for reading, it will
read back D1 D0 D3 D2. Swapping the bytes is a bad design decision because
it may introduce some endianness problems. It can affect the boot sequence
if the entire boot sequence is not handled in either 8D-8D-8D mode or 1-1-1
mode. So we must swap the bytes back to have the same byte order as in STR
modes. Fortunately there are controllers that can swap the bytes back at
runtime, addressing the flash's endiannesses requirements.
If the controllers are not capable of swapping the bytes, the protocol is
downgraded via spi_nor_spimem_adjust_hwcaps(). When available, the swapping
of the bytes is always done regardless if it's a data or register access,
so that we comply with the JESD216 requirements: "Byte order of 16-bit
words is swapped when read in 8D-8D-8D mode compared to 1-1-1".

Tested with atmel-quadspi driver, both in 8D-8D-8D mode and 1-1-1 mode
(via the protocol downgrade).

This patch set depends on the SPI MEM changes from linux-mtd/mtd/next and
also on some SPI NOR patches that are waiting for R-b tags. You can find
a branch if you want to test it at:

To github.com:ambarus/linux-0day.git
 * [new branch]                spi-nor/next-mx-byte-swap-v2 -> spi-nor/next-mx-byte-swap-v2

v2:
- all: s/bswap16/swab16 to comply with linux naming scheme
- SPI MEM: add dtr_swab16 cap to spi_controller_mem_caps and update
  spi_mem_default_supports_op() to check for it.
- SPI NOR: do not encode swab16 info in the SPI NOR protocol, use a helper
  instead
- SPI NOR: downgrade the SPI NOR protocol in case the SPI Controller does
  not support swab16
- add SPI_NOR_SOFT_RESET and support for mx66lm1g45g

Tudor Ambarus (6):
  spi: spi-mem: Allow specifying the byte order in DTR mode
  mtd: spi-nor: core: Allow specifying the byte order in DTR mode
  mtd: spi-nor: sfdp: Get the 8D-8D-8D byte order from BFPT
  mtd: spi-nor: core: Introduce SPI_NOR_DTR_BSWAP16 no_sfdp_flag
  mtd: spi-nor: core: Introduce SPI_NOR_SOFT_RESET flash_info fixup_flag
  mtd: spi-nor: macronix: Add support for mx66lm1g45g

 drivers/mtd/spi-nor/core.c     |  16 +++-
 drivers/mtd/spi-nor/core.h     |  10 ++-
 drivers/mtd/spi-nor/macronix.c | 130 +++++++++++++++++++++++++++++++++
 drivers/mtd/spi-nor/sfdp.c     |   4 +
 drivers/spi/spi-mem.c          |   4 +
 include/linux/spi/spi-mem.h    |   6 ++
 6 files changed, 168 insertions(+), 2 deletions(-)

Vignesh Raghavendra March 15, 2022, 6:08 a.m. UTC | #1

Hi,

On 11/03/22 1:31 pm, Tudor Ambarus wrote:
> There are NOR flashes (Macronix) that swap the bytes on a 16-bit boundary
> when configured in Octal DTR mode. The byte order of 16-bit words is
> swapped when read or written in Octal Double Transfer Rate (DTR) mode
> compared to Single Transfer Rate (STR) modes. If one writes D0 D1 D2 D3
> bytes using 1-1-1 mode, and uses 8D-8D-8D SPI mode for reading, it will
> read back D1 D0 D3 D2. Swapping the bytes is a bad design decision because
> it may introduce some endianness problems. It can affect the boot sequence
> if the entire boot sequence is not handled in either 8D-8D-8D mode or 1-1-1
> mode. So we must swap the bytes back to have the same byte order as in STR
> modes. Fortunately there are controllers that can swap the bytes back at
> runtime, addressing the flash's endiannesses requirements.
> If the controllers are not capable of swapping the bytes, the protocol is
> downgraded via spi_nor_spimem_adjust_hwcaps(). When available, the swapping
> of the bytes is always done regardless if it's a data or register access,
> so that we comply with the JESD216 requirements: "Byte order of 16-bit
> words is swapped when read in 8D-8D-8D mode compared to 1-1-1".
> 

Sorry, bit late to the thread. But, dropping 8D-8D-8D mode support is
quite restrictive IMO.

AFAIK, SFDP standard does not dictate how data should be stored in flash
or how SW should interpret after reading data from flash. It merely
indicates endian-ness compared to 1-1-1 mode.

So, its up to various system SWs like bootloader/Linux to work according
to pre-aligned layout as there is no rule that data needs to be stored
in byte order.

We have two types of controllers:

1. SPI controllers supporting swapping endian-ness on the fly:
-> For such flashes, better choice is to have SWAP option always
enabled. So that data written in 8D-8D-8D mode can be read correctly in
1-1-1 mode and vice-versa.
( I am assuming SWAP option of controller is only effective in 8D-8D-8D
mode and is NOP in 1-1-1 or other modes)

But, its possible that "ROM" or other non-upgradable SWs may choose not
make to use of this SWAP option of HW to keep things simple in which
case, they cannot boot from 8D-8D-8D mode with above setting. Such SW
don't always have knowledge of flash and cannot be forced to have a
constraint to enable byte swap on read.

So, IMO, its best left to system integrators to specify whether or not
SWAP option needs to be enabled (perhaps via DT as its per flash
specific property?)

2.  SPI controllers don't support endian-ness SWAP on the fly:
It is still possible to reliably read and write data as long as its
written and read back in same mode.

De-rating speeds because of absence of this support would mean reduction
of speed by **16 times** (maybe even higher as 8D mode tends to support
higher bus freqs). Swapping bytes in Linux before writing or after
reading is not an option either as it negatively impacts performance.

Asking ROM/bootloaders to swap bytes based on SFDP indication is
restrictive too as it involves boot time penalty and most systems with
OSPI flashes are using them to achieve super fast boot times.

One more case to consider is flashes that dont have SFDP table to
indicate byte order but follow Macronix's convention. In such cases, its
better for SPI NOR layer to be as dumb as possible and not really do any
byte swapping, leaving it up to user space to handle/interpret data
appropriately.

Also, Macronix is probably in violation of xSPI spec (JESD251A 6.9.5.2
8D-8D-8D Profile 1.0) where diagrams clearly show data output should be
D0 D1 D2 D3... So ROMs following xSPI spec (which is the only spec
providing flash agnostic way of switching to 8D mode and reading data in
8D mode) would not care about SFDP bit indicating byteorder and its up
to flasher programs to take care of the same

IMO, kernel device drivers should just provide access to underlying HW
and not have too much intelligence to interpret data/take decisions

So, simpler constraint to put is:
Flasher programs should program data in the same mode in which
ROM/bootloder/Linux is expected to read the data on that system.

For Macronix like flashes, if one has a ROM/bootloader that only
supports 1-1-1 mode and flashing data in 8D-8D-8D mode with Linux, then
please generate a byte-swapped image offline and flash it. Don't impose
penalty on systems that do best to handle this messy situation.
I see this as the only option with least performance penalty.

Regards
Vignesh

[...]

Tudor Ambarus March 15, 2022, 6:58 a.m. UTC | #2

On 3/15/22 08:08, Vignesh Raghavendra wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> Hi,

Hi,

> 
> On 11/03/22 1:31 pm, Tudor Ambarus wrote:
>> There are NOR flashes (Macronix) that swap the bytes on a 16-bit boundary
>> when configured in Octal DTR mode. The byte order of 16-bit words is
>> swapped when read or written in Octal Double Transfer Rate (DTR) mode
>> compared to Single Transfer Rate (STR) modes. If one writes D0 D1 D2 D3
>> bytes using 1-1-1 mode, and uses 8D-8D-8D SPI mode for reading, it will
>> read back D1 D0 D3 D2. Swapping the bytes is a bad design decision because
>> it may introduce some endianness problems. It can affect the boot sequence
>> if the entire boot sequence is not handled in either 8D-8D-8D mode or 1-1-1
>> mode. So we must swap the bytes back to have the same byte order as in STR
>> modes. Fortunately there are controllers that can swap the bytes back at
>> runtime, addressing the flash's endiannesses requirements.
>> If the controllers are not capable of swapping the bytes, the protocol is
>> downgraded via spi_nor_spimem_adjust_hwcaps(). When available, the swapping
>> of the bytes is always done regardless if it's a data or register access,
>> so that we comply with the JESD216 requirements: "Byte order of 16-bit
>> words is swapped when read in 8D-8D-8D mode compared to 1-1-1".
>>
> 
> Sorry, bit late to the thread. But, dropping 8D-8D-8D mode support is

no worries

> quite restrictive IMO.
> 
> AFAIK, SFDP standard does not dictate how data should be stored in flash
> or how SW should interpret after reading data from flash. It merely
> indicates endian-ness compared to 1-1-1 mode.
> 
> So, its up to various system SWs like bootloader/Linux to work according
> to pre-aligned layout as there is no rule that data needs to be stored
> in byte order.
> 
> We have two types of controllers:
> 
> 1. SPI controllers supporting swapping endian-ness on the fly:
> -> For such flashes, better choice is to have SWAP option always
> enabled. So that data written in 8D-8D-8D mode can be read correctly in
> 1-1-1 mode and vice-versa.
> ( I am assuming SWAP option of controller is only effective in 8D-8D-8D
> mode and is NOP in 1-1-1 or other modes)
> 
> But, its possible that "ROM" or other non-upgradable SWs may choose not
> make to use of this SWAP option of HW to keep things simple in which
> case, they cannot boot from 8D-8D-8D mode with above setting. Such SW
> don't always have knowledge of flash and cannot be forced to have a
> constraint to enable byte swap on read.
> 
> So, IMO, its best left to system integrators to specify whether or not
> SWAP option needs to be enabled (perhaps via DT as its per flash
> specific property?)

we can't use DT for configuration, maybe a Kconfig instead. Are there any
other options?

> 
> 2.  SPI controllers don't support endian-ness SWAP on the fly:
> It is still possible to reliably read and write data as long as its
> written and read back in same mode.
> 
> De-rating speeds because of absence of this support would mean reduction
> of speed by **16 times** (maybe even higher as 8D mode tends to support
> higher bus freqs). Swapping bytes in Linux before writing or after
> reading is not an option either as it negatively impacts performance.
> 
> Asking ROM/bootloaders to swap bytes based on SFDP indication is
> restrictive too as it involves boot time penalty and most systems with
> OSPI flashes are using them to achieve super fast boot times.
> 
> One more case to consider is flashes that dont have SFDP table to
> indicate byte order but follow Macronix's convention. In such cases, its
> better for SPI NOR layer to be as dumb as possible and not really do any
> byte swapping, leaving it up to user space to handle/interpret data
> appropriately.
> 
> Also, Macronix is probably in violation of xSPI spec (JESD251A 6.9.5.2
> 8D-8D-8D Profile 1.0) where diagrams clearly show data output should be
> D0 D1 D2 D3... So ROMs following xSPI spec (which is the only spec
> providing flash agnostic way of switching to 8D mode and reading data in
> 8D mode) would not care about SFDP bit indicating byteorder and its up
> to flasher programs to take care of the same

This is a bit in contradiction, because if the ROMs follow xSPI, thus little
endian byte order, they should swap the bytes.

> 
> IMO, kernel device drivers should just provide access to underlying HW
> and not have too much intelligence to interpret data/take decisions
> 
> So, simpler constraint to put is:
> Flasher programs should program data in the same mode in which
> ROM/bootloder/Linux is expected to read the data on that system.

No, this constraint doesn't cover all possible cases: take a 1-1-1 ROMcode,
8D-8D-8D for other bootloaders and kernel. You need to dynamically change modes
in the flasher program in order to address this use case, which is a no go.
 
> 
> For Macronix like flashes, if one has a ROM/bootloader that only
> supports 1-1-1 mode and flashing data in 8D-8D-8D mode with Linux, then
> please generate a byte-swapped image offline and flash it. Don't impose

we can't do that, see the example from above.

> penalty on systems that do best to handle this messy situation.
> I see this as the only option with least performance penalty.
> 

I take from this that we should let the byte swap be user-configurable,
thus a Kconfig. Which I'm not against it, but it will give users
headaches to sync all the software components. Making such a decision
implies that users know SPI NOR internal details, which also is a bit
stretched. Let's sync so that we move forward with this. Opinions? 

Cheers,
ta

Michael Walle March 15, 2022, 7:19 a.m. UTC | #3

Hi,

Am 2022-03-15 07:08, schrieb Vignesh Raghavendra:
> On 11/03/22 1:31 pm, Tudor Ambarus wrote:
>> There are NOR flashes (Macronix) that swap the bytes on a 16-bit 
>> boundary
>> when configured in Octal DTR mode. The byte order of 16-bit words is
>> swapped when read or written in Octal Double Transfer Rate (DTR) mode
>> compared to Single Transfer Rate (STR) modes. If one writes D0 D1 D2 
>> D3
>> bytes using 1-1-1 mode, and uses 8D-8D-8D SPI mode for reading, it 
>> will
>> read back D1 D0 D3 D2. Swapping the bytes is a bad design decision 
>> because
>> it may introduce some endianness problems. It can affect the boot 
>> sequence
>> if the entire boot sequence is not handled in either 8D-8D-8D mode or 
>> 1-1-1
>> mode. So we must swap the bytes back to have the same byte order as in 
>> STR
>> modes. Fortunately there are controllers that can swap the bytes back 
>> at
>> runtime, addressing the flash's endiannesses requirements.
>> If the controllers are not capable of swapping the bytes, the protocol 
>> is
>> downgraded via spi_nor_spimem_adjust_hwcaps(). When available, the 
>> swapping
>> of the bytes is always done regardless if it's a data or register 
>> access,
>> so that we comply with the JESD216 requirements: "Byte order of 16-bit
>> words is swapped when read in 8D-8D-8D mode compared to 1-1-1".
>> 
> 
> Sorry, bit late to the thread. But, dropping 8D-8D-8D mode support is
> quite restrictive IMO.
> 
> AFAIK, SFDP standard does not dictate how data should be stored in 
> flash
> or how SW should interpret after reading data from flash. It merely
> indicates endian-ness compared to 1-1-1 mode.

Mh, but below you are saying that Micronix is violating the standard
and is swapping the bytes. So, the standard is actually specifying the
byte order.

> So, its up to various system SWs like bootloader/Linux to work 
> according
> to pre-aligned layout as there is no rule that data needs to be stored
> in byte order.
> 
> We have two types of controllers:
> 
> 1. SPI controllers supporting swapping endian-ness on the fly:
> -> For such flashes, better choice is to have SWAP option always
> enabled. So that data written in 8D-8D-8D mode can be read correctly in
> 1-1-1 mode and vice-versa.
> ( I am assuming SWAP option of controller is only effective in 8D-8D-8D
> mode and is NOP in 1-1-1 or other modes)

Why should it be always enabled? You can also say it should always
be disabled. It doesn't matter if the byte order isn't important.

But I say the byte order *is* important. We need one reference.

> But, its possible that "ROM" or other non-upgradable SWs may choose not
> make to use of this SWAP option of HW to keep things simple in which
> case, they cannot boot from 8D-8D-8D mode with above setting. Such SW
> don't always have knowledge of flash and cannot be forced to have a
> constraint to enable byte swap on read.
> 
> So, IMO, its best left to system integrators to specify whether or not
> SWAP option needs to be enabled (perhaps via DT as its per flash
> specific property?)

Agreed, but I don't think we have to do it now. If someone cares,
he can make a patch. Also we have to consider non-DT platforms.

> 2.  SPI controllers don't support endian-ness SWAP on the fly:
> It is still possible to reliably read and write data as long as its
> written and read back in same mode.
> 
> De-rating speeds because of absence of this support would mean 
> reduction
> of speed by **16 times** (maybe even higher as 8D mode tends to support
> higher bus freqs).

You can also fall back to a quad mode. Again, can be implemented if
someone cares.

> Swapping bytes in Linux before writing or after
> reading is not an option either as it negatively impacts performance.
> 
> Asking ROM/bootloaders to swap bytes based on SFDP indication is
> restrictive too as it involves boot time penalty and most systems with
> OSPI flashes are using them to achieve super fast boot times.

No we are talking about what? ms? to read the sfdp? If that is really
a use case, the the bootloader can hardcode it there.

> One more case to consider is flashes that dont have SFDP table to
> indicate byte order but follow Macronix's convention. In such cases, 
> its
> better for SPI NOR layer to be as dumb as possible and not really do 
> any
> byte swapping, leaving it up to user space to handle/interpret data
> appropriately.
> 
> Also, Macronix is probably in violation of xSPI spec (JESD251A 6.9.5.2
> 8D-8D-8D Profile 1.0) where diagrams clearly show data output should be
> D0 D1 D2 D3... So ROMs following xSPI spec (which is the only spec
> providing flash agnostic way of switching to 8D mode and reading data 
> in
> 8D mode) would not care about SFDP bit indicating byteorder and its up
> to flasher programs to take care of the same
> 
> IMO, kernel device drivers should just provide access to underlying HW
> and not have too much intelligence to interpret data/take decisions

I strongly disagree here. The kernel should provide a consistent
view of the flash content, regardless if its read in 1-1-1 or 8d-8d-8d.
Imagine you are forced to switch away from 8d-8d-8d mode (for whatever
reason) after some time. How would you know how to read the contents?

JESD216 is a standard, too. There is a reason this bit ended up in
there. If I had to guess, someone messed up the byte order in 8d-8d-8d
but it was too late. And now the standard is giving you a hint that
you are using a flash with that messed up byte ordering.

I want to avoid having flash contents where the byte are swapped.
The sooner we care about that, the better. You cannot undo that
later on.

> So, simpler constraint to put is:
> Flasher programs should program data in the same mode in which
> ROM/bootloder/Linux is expected to read the data on that system.

So what if your flasher only has 1 pin? With this you are just
shifting the problem elsewhere.

> For Macronix like flashes, if one has a ROM/bootloader that only
> supports 1-1-1 mode and flashing data in 8D-8D-8D mode with Linux, then
> please generate a byte-swapped image offline and flash it. Don't impose
> penalty on systems that do best to handle this messy situation.
> I see this as the only option with least performance penalty.

I certainly have nothing against an option to turn this all off
to improve speed. But the swapping (if asked to do so) and the
degradation should be an *opt-out*. Not an opt-in. Nobody will
do the opt-in and we end up with 'corrupted' flash contents.

-michael

Vignesh Raghavendra March 16, 2022, 7:05 a.m. UTC | #4

Hi Michael,

On 15/03/22 12:49 pm, Michael Walle wrote:
> Hi,
> 
> Am 2022-03-15 07:08, schrieb Vignesh Raghavendra:
>> On 11/03/22 1:31 pm, Tudor Ambarus wrote:
>>> There are NOR flashes (Macronix) that swap the bytes on a 16-bit
>>> boundary
>>> when configured in Octal DTR mode. The byte order of 16-bit words is
>>> swapped when read or written in Octal Double Transfer Rate (DTR) mode
>>> compared to Single Transfer Rate (STR) modes. If one writes D0 D1 D2 D3
>>> bytes using 1-1-1 mode, and uses 8D-8D-8D SPI mode for reading, it will
>>> read back D1 D0 D3 D2. Swapping the bytes is a bad design decision
>>> because
>>> it may introduce some endianness problems. It can affect the boot
>>> sequence
>>> if the entire boot sequence is not handled in either 8D-8D-8D mode or
>>> 1-1-1
>>> mode. So we must swap the bytes back to have the same byte order as
>>> in STR
>>> modes. Fortunately there are controllers that can swap the bytes back at
>>> runtime, addressing the flash's endiannesses requirements.
>>> If the controllers are not capable of swapping the bytes, the
>>> protocol is
>>> downgraded via spi_nor_spimem_adjust_hwcaps(). When available, the
>>> swapping
>>> of the bytes is always done regardless if it's a data or register
>>> access,
>>> so that we comply with the JESD216 requirements: "Byte order of 16-bit
>>> words is swapped when read in 8D-8D-8D mode compared to 1-1-1".
>>>
>>
>> Sorry, bit late to the thread. But, dropping 8D-8D-8D mode support is
>> quite restrictive IMO.
>>
>> AFAIK, SFDP standard does not dictate how data should be stored in flash
>> or how SW should interpret after reading data from flash. It merely
>> indicates endian-ness compared to 1-1-1 mode.
> 
> Mh, but below you are saying that Micronix is violating the standard
> and is swapping the bytes. So, the standard is actually specifying the
> byte order.
> 

As I understand, SFDP spec(JESD216) is way of describing flash details,
it does not enforce any rules on how flash should "behave". It just
provides info for drivers to discover flash parameters at runtime.
OTOH, xSPI spec (JESD251) lays down guidelines for SW interface,
electrical, and mechanical interfaces. Macronix flash deviates from xSPI
spec but no SFDP per say (unless it claims xSPI compliance elsewhere in
SFDP tables).

>> So, its up to various system SWs like bootloader/Linux to work according
>> to pre-aligned layout as there is no rule that data needs to be stored
>> in byte order.
>>
>> We have two types of controllers:
>>
>> 1. SPI controllers supporting swapping endian-ness on the fly:
>> -> For such flashes, better choice is to have SWAP option always
>> enabled. So that data written in 8D-8D-8D mode can be read correctly in
>> 1-1-1 mode and vice-versa.
>> ( I am assuming SWAP option of controller is only effective in 8D-8D-8D
>> mode and is NOP in 1-1-1 or other modes)
> 
> Why should it be always enabled? You can also say it should always
> be disabled. It doesn't matter if the byte order isn't important.
> 
> But I say the byte order *is* important. We need one reference.
> 
>> But, its possible that "ROM" or other non-upgradable SWs may choose not
>> make to use of this SWAP option of HW to keep things simple in which
>> case, they cannot boot from 8D-8D-8D mode with above setting. Such SW
>> don't always have knowledge of flash and cannot be forced to have a
>> constraint to enable byte swap on read.
>>
>> So, IMO, its best left to system integrators to specify whether or not
>> SWAP option needs to be enabled (perhaps via DT as its per flash
>> specific property?)
> 
> Agreed, but I don't think we have to do it now. If someone cares,

I am fine with that, but I want to make sure we are not saying
controllers w/o ability to swap byte order can never support Macronix
like flash in 8D-8D-8D mode.

> he can make a patch. Also we have to consider non-DT platforms.
> 

non DT platforms also have a way to pass HW related data

>> 2.  SPI controllers don't support endian-ness SWAP on the fly:
>> It is still possible to reliably read and write data as long as its
>> written and read back in same mode.
>>
>> De-rating speeds because of absence of this support would mean reduction
>> of speed by **16 times** (maybe even higher as 8D mode tends to support
>> higher bus freqs).
> 
> You can also fall back to a quad mode. Again, can be implemented if
> someone cares.

Actually I have not come across OSPI flash that also supports quad mode.
So, fallback is 1-1-1 for majority of these flashes

> 
>> Swapping bytes in Linux before writing or after
>> reading is not an option either as it negatively impacts performance.
>>
>> Asking ROM/bootloaders to swap bytes based on SFDP indication is
>> restrictive too as it involves boot time penalty and most systems with
>> OSPI flashes are using them to achieve super fast boot times.
> 
> No we are talking about what? ms? to read the sfdp? If that is really
> a use case, the the bootloader can hardcode it there.
> 

Reading SFDP is not the issue but swapping entire image is the problem.
Takes several tens of ms depending on boot image size which is not
acceptable to systems looking at < 100 - 200msms start up times for
quick bootup usecases involving early Display / CAN output etc

>> One more case to consider is flashes that dont have SFDP table to
>> indicate byte order but follow Macronix's convention. In such cases, its
>> better for SPI NOR layer to be as dumb as possible and not really do any
>> byte swapping, leaving it up to user space to handle/interpret data
>> appropriately.
>>
>> Also, Macronix is probably in violation of xSPI spec (JESD251A 6.9.5.2
>> 8D-8D-8D Profile 1.0) where diagrams clearly show data output should be
>> D0 D1 D2 D3... So ROMs following xSPI spec (which is the only spec
>> providing flash agnostic way of switching to 8D mode and reading data in
>> 8D mode) would not care about SFDP bit indicating byteorder and its up
>> to flasher programs to take care of the same
>>
>> IMO, kernel device drivers should just provide access to underlying HW
>> and not have too much intelligence to interpret data/take decisions
> 
> I strongly disagree here. The kernel should provide a consistent
> view of the flash content, regardless if its read in 1-1-1 or 8d-8d-8d.
> Imagine you are forced to switch away from 8d-8d-8d mode (for whatever
> reason) after some time. How would you know how to read the contents?
> 

Agree with need for consistent view, although if we need to fall back to
1-1-1 mode for a while and switch back to 8D-8D-8D mode then there is
something fundamentally broken.

> JESD216 is a standard, too. There is a reason this bit ended up in
> there. If I had to guess, someone messed up the byte order in 8d-8d-8d
> but it was too late. And now the standard is giving you a hint that
> you are using a flash with that messed up byte ordering.
> 

Yes, but note JESD216 is all about giving hint to softwares  on how
flash behaves, it does not dictate what softwares need to do with it.
So another OS is free to leave these Macronix flashes using native byte
ordering and not swapping on read / write

> I want to avoid having flash contents where the byte are swapped.
> The sooner we care about that, the better. You cannot undo that
> later on.
> 
>> So, simpler constraint to put is:
>> Flasher programs should program data in the same mode in which
>> ROM/bootloder/Linux is expected to read the data on that system.
> 
> So what if your flasher only has 1 pin? With this you are just
> shifting the problem elsewhere.
> 

flasher program would have to use byte swapped image if read is expected
to be in 8D-8D mode

>> For Macronix like flashes, if one has a ROM/bootloader that only
>> supports 1-1-1 mode and flashing data in 8D-8D-8D mode with Linux, then
>> please generate a byte-swapped image offline and flash it. Don't impose
>> penalty on systems that do best to handle this messy situation.
>> I see this as the only option with least performance penalty.
> 
> I certainly have nothing against an option to turn this all off
> to improve speed. But the swapping (if asked to do so) and the
> degradation should be an *opt-out*. Not an opt-in. Nobody will
> do the opt-in and we end up with 'corrupted' flash contents.
> 

Sounds good to me. We should note this in commit message disabling
8D-8D-8D mode for these flashes.

Regards
Vignesh

Vignesh Raghavendra March 16, 2022, 7:08 a.m. UTC | #5

On 15/03/22 12:28 pm, Tudor.Ambarus@microchip.com wrote:
> On 3/15/22 08:08, Vignesh Raghavendra wrote:
>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>
>> Hi,
> 
> Hi,
> 
>>
>> On 11/03/22 1:31 pm, Tudor Ambarus wrote:
>>> There are NOR flashes (Macronix) that swap the bytes on a 16-bit boundary
>>> when configured in Octal DTR mode. The byte order of 16-bit words is
>>> swapped when read or written in Octal Double Transfer Rate (DTR) mode
>>> compared to Single Transfer Rate (STR) modes. If one writes D0 D1 D2 D3
>>> bytes using 1-1-1 mode, and uses 8D-8D-8D SPI mode for reading, it will
>>> read back D1 D0 D3 D2. Swapping the bytes is a bad design decision because
>>> it may introduce some endianness problems. It can affect the boot sequence
>>> if the entire boot sequence is not handled in either 8D-8D-8D mode or 1-1-1
>>> mode. So we must swap the bytes back to have the same byte order as in STR
>>> modes. Fortunately there are controllers that can swap the bytes back at
>>> runtime, addressing the flash's endiannesses requirements.
>>> If the controllers are not capable of swapping the bytes, the protocol is
>>> downgraded via spi_nor_spimem_adjust_hwcaps(). When available, the swapping
>>> of the bytes is always done regardless if it's a data or register access,
>>> so that we comply with the JESD216 requirements: "Byte order of 16-bit
>>> words is swapped when read in 8D-8D-8D mode compared to 1-1-1".
>>>
>>
>> Sorry, bit late to the thread. But, dropping 8D-8D-8D mode support is
> 
> no worries
> 
>> quite restrictive IMO.
>>
>> AFAIK, SFDP standard does not dictate how data should be stored in flash
>> or how SW should interpret after reading data from flash. It merely
>> indicates endian-ness compared to 1-1-1 mode.
>>
>> So, its up to various system SWs like bootloader/Linux to work according
>> to pre-aligned layout as there is no rule that data needs to be stored
>> in byte order.
>>
>> We have two types of controllers:
>>
>> 1. SPI controllers supporting swapping endian-ness on the fly:
>> -> For such flashes, better choice is to have SWAP option always
>> enabled. So that data written in 8D-8D-8D mode can be read correctly in
>> 1-1-1 mode and vice-versa.
>> ( I am assuming SWAP option of controller is only effective in 8D-8D-8D
>> mode and is NOP in 1-1-1 or other modes)
>>
>> But, its possible that "ROM" or other non-upgradable SWs may choose not
>> make to use of this SWAP option of HW to keep things simple in which
>> case, they cannot boot from 8D-8D-8D mode with above setting. Such SW
>> don't always have knowledge of flash and cannot be forced to have a
>> constraint to enable byte swap on read.
>>
>> So, IMO, its best left to system integrators to specify whether or not
>> SWAP option needs to be enabled (perhaps via DT as its per flash
>> specific property?)
> 
> we can't use DT for configuration, maybe a Kconfig instead. Are there any
> other options?
> 

Problem with Kconfig is that it cannot be used when there are multiple
flash instances on the board and both behave differently.

This is similar to big-endian vs little-endian property used by CFI
physmap driver and other places in kernel[1]

>>
>> 2.  SPI controllers don't support endian-ness SWAP on the fly:
>> It is still possible to reliably read and write data as long as its
>> written and read back in same mode.
>>
>> De-rating speeds because of absence of this support would mean reduction
>> of speed by **16 times** (maybe even higher as 8D mode tends to support
>> higher bus freqs). Swapping bytes in Linux before writing or after
>> reading is not an option either as it negatively impacts performance.
>>
>> Asking ROM/bootloaders to swap bytes based on SFDP indication is
>> restrictive too as it involves boot time penalty and most systems with
>> OSPI flashes are using them to achieve super fast boot times.
>>
>> One more case to consider is flashes that dont have SFDP table to
>> indicate byte order but follow Macronix's convention. In such cases, its
>> better for SPI NOR layer to be as dumb as possible and not really do any
>> byte swapping, leaving it up to user space to handle/interpret data
>> appropriately.
>>
>> Also, Macronix is probably in violation of xSPI spec (JESD251A 6.9.5.2
>> 8D-8D-8D Profile 1.0) where diagrams clearly show data output should be
>> D0 D1 D2 D3... So ROMs following xSPI spec (which is the only spec
>> providing flash agnostic way of switching to 8D mode and reading data in
>> 8D mode) would not care about SFDP bit indicating byteorder and its up
>> to flasher programs to take care of the same
> 
> This is a bit in contradiction, because if the ROMs follow xSPI, thus little
> endian byte order, they should swap the bytes.
> 

As I indicated below xSPI (JESD251) spec seems to align on little endian
order. So, don't really need to swap the bytes on read. Anyways impact
on performance prohibits such support and thus most ROM designs will
expect flashing program to take care of it (especially in case where
controller does not support on the fly swapping).

>>
>> IMO, kernel device drivers should just provide access to underlying HW
>> and not have too much intelligence to interpret data/take decisions
>>
>> So, simpler constraint to put is:
>> Flasher programs should program data in the same mode in which
>> ROM/bootloder/Linux is expected to read the data on that system.
> 
> No, this constraint doesn't cover all possible cases: take a 1-1-1 ROMcode,
> 8D-8D-8D for other bootloaders and kernel. You need to dynamically change modes
> in the flasher program in order to address this use case, which is a no go.
> 

Flash programmer need not change the mode, but needs to generate a byte
swapped image and write it knowing that image is being flashed in
8D-8D-8D mode and read in 1-1-1 mode or vice versa.

>>
>> For Macronix like flashes, if one has a ROM/bootloader that only
>> supports 1-1-1 mode and flashing data in 8D-8D-8D mode with Linux, then
>> please generate a byte-swapped image offline and flash it. Don't impose
> 
> we can't do that, see the example from above.
> 

See above, no need to change mode dynamically.

>> penalty on systems that do best to handle this messy situation.
>> I see this as the only option with least performance penalty.
>>
> 
> I take from this that we should let the byte swap be user-configurable,
> thus a Kconfig. Which I'm not against it, but it will give users
> headaches to sync all the software components. Making such a decision
> implies that users know SPI NOR internal details, which also is a bit
> stretched. Let's sync so that we move forward with this. Opinions? 
> 

I am fine with things being configurable

Only issues with Kconfig: it does not help systems with multiple OSPIs
with different endian-ness.


[1] Documentation/devicetree/bindings/mtd/mtd-physmap.yaml


Regards
Vignesh

Tudor Ambarus March 16, 2022, 7:57 a.m. UTC | #6

On 3/16/22 09:05, Vignesh Raghavendra wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> Hi Michael,
> 
> On 15/03/22 12:49 pm, Michael Walle wrote:
>> Hi,
>>
>> Am 2022-03-15 07:08, schrieb Vignesh Raghavendra:
>>> On 11/03/22 1:31 pm, Tudor Ambarus wrote:
>>>> There are NOR flashes (Macronix) that swap the bytes on a 16-bit
>>>> boundary
>>>> when configured in Octal DTR mode. The byte order of 16-bit words is
>>>> swapped when read or written in Octal Double Transfer Rate (DTR) mode
>>>> compared to Single Transfer Rate (STR) modes. If one writes D0 D1 D2 D3
>>>> bytes using 1-1-1 mode, and uses 8D-8D-8D SPI mode for reading, it will
>>>> read back D1 D0 D3 D2. Swapping the bytes is a bad design decision
>>>> because
>>>> it may introduce some endianness problems. It can affect the boot
>>>> sequence
>>>> if the entire boot sequence is not handled in either 8D-8D-8D mode or
>>>> 1-1-1
>>>> mode. So we must swap the bytes back to have the same byte order as
>>>> in STR
>>>> modes. Fortunately there are controllers that can swap the bytes back at
>>>> runtime, addressing the flash's endiannesses requirements.
>>>> If the controllers are not capable of swapping the bytes, the
>>>> protocol is
>>>> downgraded via spi_nor_spimem_adjust_hwcaps(). When available, the
>>>> swapping
>>>> of the bytes is always done regardless if it's a data or register
>>>> access,
>>>> so that we comply with the JESD216 requirements: "Byte order of 16-bit
>>>> words is swapped when read in 8D-8D-8D mode compared to 1-1-1".
>>>>
>>>
>>> Sorry, bit late to the thread. But, dropping 8D-8D-8D mode support is
>>> quite restrictive IMO.
>>>
>>> AFAIK, SFDP standard does not dictate how data should be stored in flash
>>> or how SW should interpret after reading data from flash. It merely
>>> indicates endian-ness compared to 1-1-1 mode.
>>
>> Mh, but below you are saying that Micronix is violating the standard
>> and is swapping the bytes. So, the standard is actually specifying the
>> byte order.
>>
> 
> As I understand, SFDP spec(JESD216) is way of describing flash details,
> it does not enforce any rules on how flash should "behave". It just
> provides info for drivers to discover flash parameters at runtime.
> OTOH, xSPI spec (JESD251) lays down guidelines for SW interface,
> electrical, and mechanical interfaces. Macronix flash deviates from xSPI
> spec but no SFDP per say (unless it claims xSPI compliance elsewhere in
> SFDP tables).
> 
>>> So, its up to various system SWs like bootloader/Linux to work according
>>> to pre-aligned layout as there is no rule that data needs to be stored
>>> in byte order.
>>>
>>> We have two types of controllers:
>>>
>>> 1. SPI controllers supporting swapping endian-ness on the fly:
>>> -> For such flashes, better choice is to have SWAP option always
>>> enabled. So that data written in 8D-8D-8D mode can be read correctly in
>>> 1-1-1 mode and vice-versa.
>>> ( I am assuming SWAP option of controller is only effective in 8D-8D-8D
>>> mode and is NOP in 1-1-1 or other modes)
>>
>> Why should it be always enabled? You can also say it should always
>> be disabled. It doesn't matter if the byte order isn't important.
>>
>> But I say the byte order *is* important. We need one reference.
>>
>>> But, its possible that "ROM" or other non-upgradable SWs may choose not
>>> make to use of this SWAP option of HW to keep things simple in which
>>> case, they cannot boot from 8D-8D-8D mode with above setting. Such SW
>>> don't always have knowledge of flash and cannot be forced to have a
>>> constraint to enable byte swap on read.
>>>
>>> So, IMO, its best left to system integrators to specify whether or not
>>> SWAP option needs to be enabled (perhaps via DT as its per flash
>>> specific property?)
>>
>> Agreed, but I don't think we have to do it now. If someone cares,
> 
> I am fine with that, but I want to make sure we are not saying
> controllers w/o ability to swap byte order can never support Macronix
> like flash in 8D-8D-8D mode.
> 
>> he can make a patch. Also we have to consider non-DT platforms.
>>
> 
> non DT platforms also have a way to pass HW related data
> 
>>> 2.  SPI controllers don't support endian-ness SWAP on the fly:
>>> It is still possible to reliably read and write data as long as its
>>> written and read back in same mode.
>>>
>>> De-rating speeds because of absence of this support would mean reduction
>>> of speed by **16 times** (maybe even higher as 8D mode tends to support
>>> higher bus freqs).
>>
>> You can also fall back to a quad mode. Again, can be implemented if
>> someone cares.
> 
> Actually I have not come across OSPI flash that also supports quad mode.
> So, fallback is 1-1-1 for majority of these flashes

8-8-8 is typically supported where 8d-8d-8d is supported. But downgrading to
8-8-8 still comes with a performance penalty.

> 
>>
>>> Swapping bytes in Linux before writing or after
>>> reading is not an option either as it negatively impacts performance.
>>>
>>> Asking ROM/bootloaders to swap bytes based on SFDP indication is
>>> restrictive too as it involves boot time penalty and most systems with
>>> OSPI flashes are using them to achieve super fast boot times.
>>
>> No we are talking about what? ms? to read the sfdp? If that is really
>> a use case, the the bootloader can hardcode it there.
>>
> 
> Reading SFDP is not the issue but swapping entire image is the problem.
> Takes several tens of ms depending on boot image size which is not
> acceptable to systems looking at < 100 - 200msms start up times for
> quick bootup usecases involving early Display / CAN output etc
> 
>>> One more case to consider is flashes that dont have SFDP table to
>>> indicate byte order but follow Macronix's convention. In such cases, its
>>> better for SPI NOR layer to be as dumb as possible and not really do any
>>> byte swapping, leaving it up to user space to handle/interpret data
>>> appropriately.
>>>
>>> Also, Macronix is probably in violation of xSPI spec (JESD251A 6.9.5.2
>>> 8D-8D-8D Profile 1.0) where diagrams clearly show data output should be
>>> D0 D1 D2 D3... So ROMs following xSPI spec (which is the only spec
>>> providing flash agnostic way of switching to 8D mode and reading data in
>>> 8D mode) would not care about SFDP bit indicating byteorder and its up
>>> to flasher programs to take care of the same
>>>
>>> IMO, kernel device drivers should just provide access to underlying HW
>>> and not have too much intelligence to interpret data/take decisions
>>
>> I strongly disagree here. The kernel should provide a consistent
>> view of the flash content, regardless if its read in 1-1-1 or 8d-8d-8d.
>> Imagine you are forced to switch away from 8d-8d-8d mode (for whatever
>> reason) after some time. How would you know how to read the contents?
>>
> 
> Agree with need for consistent view, although if we need to fall back to
> 1-1-1 mode for a while and switch back to 8D-8D-8D mode then there is
> something fundamentally broken.
> 
>> JESD216 is a standard, too. There is a reason this bit ended up in
>> there. If I had to guess, someone messed up the byte order in 8d-8d-8d
>> but it was too late. And now the standard is giving you a hint that
>> you are using a flash with that messed up byte ordering.
>>
> 
> Yes, but note JESD216 is all about giving hint to softwares  on how
> flash behaves, it does not dictate what softwares need to do with it.
> So another OS is free to leave these Macronix flashes using native byte
> ordering and not swapping on read / write
> 
>> I want to avoid having flash contents where the byte are swapped.
>> The sooner we care about that, the better. You cannot undo that
>> later on.
>>
>>> So, simpler constraint to put is:
>>> Flasher programs should program data in the same mode in which
>>> ROM/bootloder/Linux is expected to read the data on that system.
>>
>> So what if your flasher only has 1 pin? With this you are just
>> shifting the problem elsewhere.
>>
> 
> flasher program would have to use byte swapped image if read is expected
> to be in 8D-8D mode

:) 

> 
>>> For Macronix like flashes, if one has a ROM/bootloader that only
>>> supports 1-1-1 mode and flashing data in 8D-8D-8D mode with Linux, then
>>> please generate a byte-swapped image offline and flash it. Don't impose
>>> penalty on systems that do best to handle this messy situation.
>>> I see this as the only option with least performance penalty.
>>
>> I certainly have nothing against an option to turn this all off
>> to improve speed. But the swapping (if asked to do so) and the
>> degradation should be an *opt-out*. Not an opt-in. Nobody will
>> do the opt-in and we end up with 'corrupted' flash contents.
>>
> 
> Sounds good to me. We should note this in commit message disabling
> 8D-8D-8D mode for these flashes.
> 

I like Michael's *opt-out* idea too. But I'm not convinced with a dt property
used to configure when to swap the bytes and when not. DT should be used to
describe the hardware, not to configure it. So we can add a "dtr-swab16"
property but IMO it should specify just that the bytes are swapped in 8D-8D-8D
mode, just as SFDP does, and not to configure the logic to swab16 or not. And
if it just describes the hardware and not configuring it, then the entire dt
idea diminishes because it is equivalent with setting a SNOR_F flag either
by a flash-info flag or at the SFDP parsing time.

Michael Walle March 16, 2022, 8:39 a.m. UTC | #7

Am 2022-03-16 08:08, schrieb Vignesh Raghavendra:
> On 15/03/22 12:28 pm, Tudor.Ambarus@microchip.com wrote:
>> On 3/15/22 08:08, Vignesh Raghavendra wrote:
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you 
>>> know the content is safe
>>> 
>>> Hi,
>> 
>> Hi,
>> 
>>> 
>>> On 11/03/22 1:31 pm, Tudor Ambarus wrote:
>>>> There are NOR flashes (Macronix) that swap the bytes on a 16-bit 
>>>> boundary
>>>> when configured in Octal DTR mode. The byte order of 16-bit words is
>>>> swapped when read or written in Octal Double Transfer Rate (DTR) 
>>>> mode
>>>> compared to Single Transfer Rate (STR) modes. If one writes D0 D1 D2 
>>>> D3
>>>> bytes using 1-1-1 mode, and uses 8D-8D-8D SPI mode for reading, it 
>>>> will
>>>> read back D1 D0 D3 D2. Swapping the bytes is a bad design decision 
>>>> because
>>>> it may introduce some endianness problems. It can affect the boot 
>>>> sequence
>>>> if the entire boot sequence is not handled in either 8D-8D-8D mode 
>>>> or 1-1-1
>>>> mode. So we must swap the bytes back to have the same byte order as 
>>>> in STR
>>>> modes. Fortunately there are controllers that can swap the bytes 
>>>> back at
>>>> runtime, addressing the flash's endiannesses requirements.
>>>> If the controllers are not capable of swapping the bytes, the 
>>>> protocol is
>>>> downgraded via spi_nor_spimem_adjust_hwcaps(). When available, the 
>>>> swapping
>>>> of the bytes is always done regardless if it's a data or register 
>>>> access,
>>>> so that we comply with the JESD216 requirements: "Byte order of 
>>>> 16-bit
>>>> words is swapped when read in 8D-8D-8D mode compared to 1-1-1".
>>>> 
>>> 
>>> Sorry, bit late to the thread. But, dropping 8D-8D-8D mode support is
>> 
>> no worries
>> 
>>> quite restrictive IMO.
>>> 
>>> AFAIK, SFDP standard does not dictate how data should be stored in 
>>> flash
>>> or how SW should interpret after reading data from flash. It merely
>>> indicates endian-ness compared to 1-1-1 mode.
>>> 
>>> So, its up to various system SWs like bootloader/Linux to work 
>>> according
>>> to pre-aligned layout as there is no rule that data needs to be 
>>> stored
>>> in byte order.
>>> 
>>> We have two types of controllers:
>>> 
>>> 1. SPI controllers supporting swapping endian-ness on the fly:
>>> -> For such flashes, better choice is to have SWAP option always
>>> enabled. So that data written in 8D-8D-8D mode can be read correctly 
>>> in
>>> 1-1-1 mode and vice-versa.
>>> ( I am assuming SWAP option of controller is only effective in 
>>> 8D-8D-8D
>>> mode and is NOP in 1-1-1 or other modes)
>>> 
>>> But, its possible that "ROM" or other non-upgradable SWs may choose 
>>> not
>>> make to use of this SWAP option of HW to keep things simple in which
>>> case, they cannot boot from 8D-8D-8D mode with above setting. Such SW
>>> don't always have knowledge of flash and cannot be forced to have a
>>> constraint to enable byte swap on read.
>>> 
>>> So, IMO, its best left to system integrators to specify whether or 
>>> not
>>> SWAP option needs to be enabled (perhaps via DT as its per flash
>>> specific property?)
>> 
>> we can't use DT for configuration, maybe a Kconfig instead. Are there 
>> any
>> other options?
>> 
> 
> Problem with Kconfig is that it cannot be used when there are multiple
> flash instances on the board and both behave differently.
> 
> This is similar to big-endian vs little-endian property used by CFI
> physmap driver and other places in kernel[1]

On more thing to consider is that if we are using Kconfig things
cannot be changed during runtime. While this might be ok when you
are building your typical embedded distribution, there are efforts
to be able to boot standard distributions on embedded platforms (see
ARM SystemReady). A Kconfig option won't work there.

-michael

>>> 
>>> 2.  SPI controllers don't support endian-ness SWAP on the fly:
>>> It is still possible to reliably read and write data as long as its
>>> written and read back in same mode.
>>> 
>>> De-rating speeds because of absence of this support would mean 
>>> reduction
>>> of speed by **16 times** (maybe even higher as 8D mode tends to 
>>> support
>>> higher bus freqs). Swapping bytes in Linux before writing or after
>>> reading is not an option either as it negatively impacts performance.
>>> 
>>> Asking ROM/bootloaders to swap bytes based on SFDP indication is
>>> restrictive too as it involves boot time penalty and most systems 
>>> with
>>> OSPI flashes are using them to achieve super fast boot times.
>>> 
>>> One more case to consider is flashes that dont have SFDP table to
>>> indicate byte order but follow Macronix's convention. In such cases, 
>>> its
>>> better for SPI NOR layer to be as dumb as possible and not really do 
>>> any
>>> byte swapping, leaving it up to user space to handle/interpret data
>>> appropriately.
>>> 
>>> Also, Macronix is probably in violation of xSPI spec (JESD251A 
>>> 6.9.5.2
>>> 8D-8D-8D Profile 1.0) where diagrams clearly show data output should 
>>> be
>>> D0 D1 D2 D3... So ROMs following xSPI spec (which is the only spec
>>> providing flash agnostic way of switching to 8D mode and reading data 
>>> in
>>> 8D mode) would not care about SFDP bit indicating byteorder and its 
>>> up
>>> to flasher programs to take care of the same
>> 
>> This is a bit in contradiction, because if the ROMs follow xSPI, thus 
>> little
>> endian byte order, they should swap the bytes.
>> 
> 
> As I indicated below xSPI (JESD251) spec seems to align on little 
> endian
> order. So, don't really need to swap the bytes on read. Anyways impact
> on performance prohibits such support and thus most ROM designs will
> expect flashing program to take care of it (especially in case where
> controller does not support on the fly swapping).
> 
>>> 
>>> IMO, kernel device drivers should just provide access to underlying 
>>> HW
>>> and not have too much intelligence to interpret data/take decisions
>>> 
>>> So, simpler constraint to put is:
>>> Flasher programs should program data in the same mode in which
>>> ROM/bootloder/Linux is expected to read the data on that system.
>> 
>> No, this constraint doesn't cover all possible cases: take a 1-1-1 
>> ROMcode,
>> 8D-8D-8D for other bootloaders and kernel. You need to dynamically 
>> change modes
>> in the flasher program in order to address this use case, which is a 
>> no go.
>> 
> 
> Flash programmer need not change the mode, but needs to generate a byte
> swapped image and write it knowing that image is being flashed in
> 8D-8D-8D mode and read in 1-1-1 mode or vice versa.
> 
>>> 
>>> For Macronix like flashes, if one has a ROM/bootloader that only
>>> supports 1-1-1 mode and flashing data in 8D-8D-8D mode with Linux, 
>>> then
>>> please generate a byte-swapped image offline and flash it. Don't 
>>> impose
>> 
>> we can't do that, see the example from above.
>> 
> 
> See above, no need to change mode dynamically.
> 
>>> penalty on systems that do best to handle this messy situation.
>>> I see this as the only option with least performance penalty.
>>> 
>> 
>> I take from this that we should let the byte swap be 
>> user-configurable,
>> thus a Kconfig. Which I'm not against it, but it will give users
>> headaches to sync all the software components. Making such a decision
>> implies that users know SPI NOR internal details, which also is a bit
>> stretched. Let's sync so that we move forward with this. Opinions?
>> 
> 
> I am fine with things being configurable
> 
> Only issues with Kconfig: it does not help systems with multiple OSPIs
> with different endian-ness.
> 
> 
> [1] Documentation/devicetree/bindings/mtd/mtd-physmap.yaml
> 
> 
> Regards
> Vignesh

David Laight March 16, 2022, 1:55 p.m. UTC | #8

Thought...

Can you read the device in STR mode until you get a suitable
non-palindromic value, then read it in DTR mode and dynamically
determine the byte order?

Clearly this won't work if the device is erased to all 0xff.
But a check could be done on/after the first write.

I suspect write times are actually dominated by the time spent
waiting for the write to complete?
(Never mind the earlier block erase time.)
So always writing in STR mode probably makes little difference?
Writes really ought to be uncommon as well.

Speeding up reads is a different matter - and probably useful.

Of course, if you've got hardware reading the spi memory in DTR
mode for config data you might need to byteswap it (compared
to the STR writes) - but that is probably a 2nd order problem.

I've got some bespoke logic on an PCIe fpga for accessing spi memory.
Uses address bits for the control signals and converts a 32bit
read/write into 8 nibble transfers to the chip.
(uses byte enables - don't an odd number of clocks.)
mmapp()ed to userspace for updating the 6MB fpga image.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Michael Walle March 17, 2022, 9:40 a.m. UTC | #9

Hi,

Am 2022-03-16 14:55, schrieb David Laight:
> Thought...

Thank you for your proposal.

> Can you read the device in STR mode until you get a suitable
> non-palindromic value, then read it in DTR mode and dynamically
> determine the byte order?
> 
> Clearly this won't work if the device is erased to all 0xff.
> But a check could be done on/after the first write.
> 
> I suspect write times are actually dominated by the time spent
> waiting for the write to complete?
> (Never mind the earlier block erase time.)
> So always writing in STR mode probably makes little difference?
> Writes really ought to be uncommon as well.
> 
> Speeding up reads is a different matter - and probably useful.
> 
> Of course, if you've got hardware reading the spi memory in DTR
> mode for config data you might need to byteswap it (compared
> to the STR writes) - but that is probably a 2nd order problem.
> 
> I've got some bespoke logic on an PCIe fpga for accessing spi memory.
> Uses address bits for the control signals and converts a 32bit
> read/write into 8 nibble transfers to the chip.
> (uses byte enables - don't an odd number of clocks.)
> mmapp()ed to userspace for updating the 6MB fpga image.

Our problem is not how to detect that we have to swap it, but
rather what we do when we have to do it.

If we have a controller which can swap the bytes for us on the
fly, we are lucky and can enable swapping if we need it. We are
also lucky when we don't have to swap the flash contents, obviously.

But what do we do when we need to swap it and the controller
doesn't support it. We could do it in software which will slow
things down. So depending on the use case this might or might not
work. We can degrade it to a speed which doesn't have this issue;
which might be 1-1-1 in the worst case. We could also do just
nothing special; but this will lead to inconsistencies between
reading in 1-1-1 and 8d-8d-8d.

-michael

David Laight March 17, 2022, 10:14 a.m. UTC | #10

From: Michael Walle
> Sent: 17 March 2022 09:40
> 
> Am 2022-03-16 14:55, schrieb David Laight:
> > Thought...
> 
> Thank you for your proposal.
> 
> > Can you read the device in STR mode until you get a suitable
> > non-palindromic value, then read it in DTR mode and dynamically
> > determine the byte order?
> >
> > Clearly this won't work if the device is erased to all 0xff.
> > But a check could be done on/after the first write.
> >
> > I suspect write times are actually dominated by the time spent
> > waiting for the write to complete?
> > (Never mind the earlier block erase time.)
> > So always writing in STR mode probably makes little difference?
> > Writes really ought to be uncommon as well.
> >
> > Speeding up reads is a different matter - and probably useful.
> >
> > Of course, if you've got hardware reading the spi memory in DTR
> > mode for config data you might need to byteswap it (compared
> > to the STR writes) - but that is probably a 2nd order problem.
> >
> > I've got some bespoke logic on an PCIe fpga for accessing spi memory.
> > Uses address bits for the control signals and converts a 32bit
> > read/write into 8 nibble transfers to the chip.
> > (uses byte enables - don't an odd number of clocks.)
> > mmapp()ed to userspace for updating the 6MB fpga image.
> 
> Our problem is not how to detect that we have to swap it, but
> rather what we do when we have to do it.
> 
> If we have a controller which can swap the bytes for us on the
> fly, we are lucky and can enable swapping if we need it. We are
> also lucky when we don't have to swap the flash contents, obviously.
> 
> But what do we do when we need to swap it and the controller
> doesn't support it. We could do it in software which will slow
> things down. So depending on the use case this might or might not
> work. We can degrade it to a speed which doesn't have this issue;
> which might be 1-1-1 in the worst case. We could also do just
> nothing special; but this will lead to inconsistencies between
> reading in 1-1-1 and 8d-8d-8d.

I really doubt you'll notice the effects of a software byteswap
compared to the actual time taken to do an spi read.

What's the maximum clock rate for spi memory?
Something like 50MHz ?
If the spi controller isn't doing dma then the cpu pio reads
to get the data are very likely to be even slower than that.
(Especially if they are PCIe reads.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Vignesh Raghavendra March 17, 2022, 10:23 a.m. UTC | #11

On 17/03/22 3:44 pm, David Laight wrote:
> From: Michael Walle
>> Sent: 17 March 2022 09:40
>>
>> Am 2022-03-16 14:55, schrieb David Laight:
>>> Thought...
>>
>> Thank you for your proposal.
>>
>>> Can you read the device in STR mode until you get a suitable
>>> non-palindromic value, then read it in DTR mode and dynamically
>>> determine the byte order?
>>>
>>> Clearly this won't work if the device is erased to all 0xff.
>>> But a check could be done on/after the first write.
>>>
>>> I suspect write times are actually dominated by the time spent
>>> waiting for the write to complete?
>>> (Never mind the earlier block erase time.)
>>> So always writing in STR mode probably makes little difference?
>>> Writes really ought to be uncommon as well.
>>>
>>> Speeding up reads is a different matter - and probably useful.
>>>
>>> Of course, if you've got hardware reading the spi memory in DTR
>>> mode for config data you might need to byteswap it (compared
>>> to the STR writes) - but that is probably a 2nd order problem.
>>>
>>> I've got some bespoke logic on an PCIe fpga for accessing spi memory.
>>> Uses address bits for the control signals and converts a 32bit
>>> read/write into 8 nibble transfers to the chip.
>>> (uses byte enables - don't an odd number of clocks.)
>>> mmapp()ed to userspace for updating the 6MB fpga image.
>>
>> Our problem is not how to detect that we have to swap it, but
>> rather what we do when we have to do it.
>>
>> If we have a controller which can swap the bytes for us on the
>> fly, we are lucky and can enable swapping if we need it. We are
>> also lucky when we don't have to swap the flash contents, obviously.
>>
>> But what do we do when we need to swap it and the controller
>> doesn't support it. We could do it in software which will slow
>> things down. So depending on the use case this might or might not
>> work. We can degrade it to a speed which doesn't have this issue;
>> which might be 1-1-1 in the worst case. We could also do just
>> nothing special; but this will lead to inconsistencies between
>> reading in 1-1-1 and 8d-8d-8d.
> 
> I really doubt you'll notice the effects of a software byteswap
> compared to the actual time taken to do an spi read.
> 
> What's the maximum clock rate for spi memory?
> Something like 50MHz ?

We have Octal SPI flashes running at upwards of 200MHz clock (400MB/s)
so SW byteswap will add significant overhead.


> If the spi controller isn't doing dma then the cpu pio reads
> to get the data are very likely to be even slower than that.
> (Especially if they are PCIe reads.)
> 

Modern OSPI/QSPI flash controllers provide MMIO interface to read from
flash where DMA can pull data as if though you are reading from On chip RAM

Regards
Vignesh

David Laight March 17, 2022, 11:10 a.m. UTC | #12

From: Vignesh Raghavendra
> Sent: 17 March 2022 10:24
...
> Modern OSPI/QSPI flash controllers provide MMIO interface to read from
> flash where DMA can pull data as if though you are reading from On chip RAM

So the cpu does an MMIO read cycle to the controller which doesn't
complete until (for the nibble-mode spi device I have):
1) Chipselect is asserted.
2) The 8-bit command has been clocked out.
3) The 32bit address have been clocked out (8 clocks in nibbles).
4) A few (probably 4) extra delay clocks are added.
5) The data is read - 8 clocks for 32bits in nibble mode.
6) Chipselect is removed.

Now you can do long sequential reads without all the red tape.
But a random read in nibble mode is about 30 clocks.
16 bit mode saves 6 clocks for the data and maybe 6 for the address?

The controller could do 'clever stuff' for sequential reads.
At a cost of slowing down random reads.

So even at 400MHz it isn't that fast.

If the MMIO interface to the flash controller is PCIe you can
add in a load of extra latency for the cpu read itself.

While PCIe allows multiple read requests to be outstanding,
the Intel cpu I've looked at serialise the reads from each
cpu core (each cpu always uses the same TLP tag).

Now longer read TLP help a lot (IIRC max is 256 bytes).
But the x86 cpu will only generate read TLP for register reads.
You need to use AVX512 registers (or cache line fetches) to
get better throughput!

The alternative is getting the flash controller to issue
the read/write TLP for memory transfers.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Vignesh Raghavendra March 17, 2022, 4:49 p.m. UTC | #13

On 17/03/22 4:40 pm, David Laight wrote:
> From: Vignesh Raghavendra
>> Sent: 17 March 2022 10:24
> ...
>> Modern OSPI/QSPI flash controllers provide MMIO interface to read from
>> flash where DMA can pull data as if though you are reading from On chip RAM
> 
> So the cpu does an MMIO read cycle to the controller which doesn't
> complete until (for the nibble-mode spi device I have):
> 1) Chipselect is asserted.
> 2) The 8-bit command has been clocked out.
> 3) The 32bit address have been clocked out (8 clocks in nibbles).
> 4) A few (probably 4) extra delay clocks are added.
> 5) The data is read - 8 clocks for 32bits in nibble mode.
> 6) Chipselect is removed.
> 
> Now you can do long sequential reads without all the red tape.
> But a random read in nibble mode is about 30 clocks.
> 16 bit mode saves 6 clocks for the data and maybe 6 for the address?
> 
> The controller could do 'clever stuff' for sequential reads.
> At a cost of slowing down random reads.
> 
> So even at 400MHz it isn't that fast.

Random CPU reads would be inherently slow, its just how HW is.

But, there are cases like image load from flash and Filesystem over
flash which would use DMA to maximize performance, such cases would be
greatly affected if we do SW byte swap

> 
> If the MMIO interface to the flash controller is PCIe you can
> add in a load of extra latency for the cpu read itself.
> 
> While PCIe allows multiple read requests to be outstanding,
> the Intel cpu I've looked at serialise the reads from each
> cpu core (each cpu always uses the same TLP tag).
> 
> Now longer read TLP help a lot (IIRC max is 256 bytes).
> But the x86 cpu will only generate read TLP for register reads.
> You need to use AVX512 registers (or cache line fetches) to
> get better throughput!
> 

Direct CPU fetch from SPI would not be able to make use of full
Bandwidth for high speed flashes and its not the only usecase.

Regards
Vignesh

[v2,0/6] spi-mem: Allow specifying the byte order in DTR mode

Message

Comments