diff mbox

[v5] spi: orion.c: Add direct access mode

Message ID 1460974417-32375-1-git-send-email-sr@denx.de (mailing list archive)
State New, archived
Headers show

Commit Message

Stefan Roese April 18, 2016, 10:13 a.m. UTC
This patch adds support for the direct access mode to the Orion SPI
driver which is used on the Marvell Armada based SoCs. In this direct
mode, all data written to (or read from) a specifically mapped MBus
window (linked to one SPI chip-select on one of the SPI controllers)
will be transferred directly to the SPI bus. Without the need to control
the SPI registers in between. This can improve the SPI transfer rate in
such cases.

Both, direct-read and -write mode are supported. But only the write
mode has been tested. This mode especially benefits from the SPI direct
mode, as the data bytes are written head-to-head to the SPI bus,
without any additional addresses.

One use-case for this direct write mode is, programming a FPGA bitstream
image into the FPGA connected to the SPI bus at maximum speed.

This mode is described in chapter "22.5.2 Direct Write to SPI" in the
Marvell Armada XP Functional Spec Datasheet.

Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Mark Brown <broonie@kernel.org>
---
v5:
- Added some documentation for the direct mode to the orion-spi DT
  bindings document, including an example and a reference to the
  mbus DT bindings documentation.

v4:
- Add max_transfer_size()
- Fall back from direct-write mode to PIO mode when
  - transfer size is exceedeed
  - non-8bit transfers
- Restrict to direct-write mode for now. SPI direct read mode can
  be added later, when someone has a platform supporting this
  mode.

v3:
- Used static MBus windows again, as suggested by Arnd
- No new DT bindings needed for this. The MBus windows need to be
  configured in the 'reg' property of the SPI controller. This is
  done in a separate patch for the Armada platforms.
- The direct mapping is configured and enable by adding the
  specific MBus entry to the 'ranges' property of the 'soc' node
  in the board dts file.

v2:
- Use one MBus window for each SPI controller instead of one for
  for each SPI device. This MBus window is re-assigned, once the
  CS changes.
- Assert the CS over the entire message
- Don't restrict the direct access mode to only direct write mode
- Add check for max size before using memcpy()
- Remove spidev from DT bindings documentation

 .../devicetree/bindings/spi/spi-orion.txt          | 47 +++++++++++-
 drivers/spi/spi-orion.c                            | 87 ++++++++++++++++++++++
 2 files changed, 133 insertions(+), 1 deletion(-)

Comments

Arnd Bergmann April 18, 2016, 10:51 a.m. UTC | #1
On Monday 18 April 2016 12:13:37 Stefan Roese wrote:
> This patch adds support for the direct access mode to the Orion SPI
> driver which is used on the Marvell Armada based SoCs. In this direct
> mode, all data written to (or read from) a specifically mapped MBus
> window (linked to one SPI chip-select on one of the SPI controllers)
> will be transferred directly to the SPI bus. Without the need to control
> the SPI registers in between. This can improve the SPI transfer rate in
> such cases.
> 
> Both, direct-read and -write mode are supported. But only the write
> mode has been tested. This mode especially benefits from the SPI direct
> mode, as the data bytes are written head-to-head to the SPI bus,
> without any additional addresses.
> 
> One use-case for this direct write mode is, programming a FPGA bitstream
> image into the FPGA connected to the SPI bus at maximum speed.
> 
> This mode is described in chapter "22.5.2 Direct Write to SPI" in the
> Marvell Armada XP Functional Spec Datasheet.
> 
> Signed-off-by: Stefan Roese <sr@denx.de>
> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Mark Brown <broonie@kernel.org>
> ---
> v5:
> - Added some documentation for the direct mode to the orion-spi DT
>   bindings document, including an example and a reference to the
>   mbus DT bindings documentation.

Thanks, looks good overall. A couple of minor points here:

> -- reg : offset and length of the register set for the device
> +- reg : offset and length of the register set for the device.
> +	This property can optionally have additional entries to configure
> +	the SPI direct access mode that some of the Marvell SoCs support
> +	additionally to the normal indirect access (PIO) mode. The values
> +	for the MBus "target" and "attribute" are defined in the Marvell
> +	SoC "Functional Specifications" Manual in the chapter "Marvell
> +	Core Processor Address Decoding".

While it might be obvious, I'd add something like

       The eight register sets following the control registers refer to
       chipselect lines 0 through 7 respectively.

>  - cell-index : Which of multiple SPI controllers is this.
>  Optional properties:
>  - interrupts : Is currently not used.
> @@ -23,3 +29,42 @@ Example:
>  	       interrupts = <23>;
>  	       status = "disabled";
>         };
> +
> +Example with SPI direct mode support (optionally):
> +	spi0: spi@10600 {
> +		compatible = "marvell,orion-spi";
> +		#address-cells = <1>;
> +		#size-cells = <0>;
> +		cell-index = <0>;
> +		reg = <MBUS_ID(0xf0, 0x01) 0x10600 0x28	/* control */
> +		       MBUS_ID(0x01, 0x1e) 0 0x100000	/* CS0 */
> +		       MBUS_ID(0x01, 0x5e) 0 0x100000	/* CS1 */
> +		       MBUS_ID(0x01, 0x9e) 0 0x100000	/* CS2 */
> +		       MBUS_ID(0x01, 0xde) 0 0x100000	/* CS3 */
> +		       MBUS_ID(0x01, 0x1f) 0 0x100000	/* CS4 */
> +		       MBUS_ID(0x01, 0x5f) 0 0x100000	/* CS5 */
> +		       MBUS_ID(0x01, 0x9f) 0 0x100000	/* CS6 */
> +		       MBUS_ID(0x01, 0xdf) 0 0x100000>;	/* CS7 */

I prefer writing this as

		reg = <MBUS_ID(0xf0, 0x01) 0x10600 0x28>,	/* control */
		      <MBUS_ID(0x01, 0x1e) 0 0x100000>,	/* CS0 */
		      <MBUS_ID(0x01, 0x5e) 0 0x100000>,	/* CS1 */
		      <MBUS_ID(0x01, 0x9e) 0 0x100000>,	/* CS2 */
		      <MBUS_ID(0x01, 0xde) 0 0x100000>,	/* CS3 */
		      <MBUS_ID(0x01, 0x1f) 0 0x100000>,	/* CS4 */
		      <MBUS_ID(0x01, 0x5f) 0 0x100000>,	/* CS5 */
		      <MBUS_ID(0x01, 0x9f) 0 0x100000>,	/* CS6 */
		      <MBUS_ID(0x01, 0xdf) 0 0x100000>;	/* CS7 */

Same for the ranges below.

Are you sure about the 1MB length for each one? I don't remember seeing this
limitation in the datasheet, so maybe the length should be specified as
0xffffffff (we can't use 0x10000000 unfortunately as that doesn't fit within
a 32-bit cell), to make it possible to map larger register areas.

> +To enable the direct mode, the board specific 'ranges' property in the
> +'soc' node needs to add the entries for the desired SPI controllers
> +and its chip-selects that are used in the direct mode instead of PIO
> +mode. Here an example for this (SPI controller 0, device 1 and SPI
> +controller 1, device 2 are used in direct mode. All other SPI device
> +are used in the default indirect (PIO) mode):
> +	soc {
> +		/*
> +		 * Enable the SPI direct access by configuring an entry
> +		 * here in the board-specific ranges property
> +		 */
> +		ranges = <MBUS_ID(0xf0, 0x01) 0 0 0xf1000000 0x100000	/* internal regs */
> +			  MBUS_ID(0x01, 0x1d) 0 0 0xfff00000 0x100000	/* BootROM       */
> +			  MBUS_ID(0x01, 0x5e) 0 0 0xf1100000 0x100000	/* SPI0-DEV1     */
> +			  MBUS_ID(0x01, 0x9a) 0 0 0xf1200000 0x100000>;	/* SPI1-DEV2     */
> +
> +For further information on the MBus bindings, please see the MBus
> +DT documentation:
> +Documentation/devicetree/bindings/bus/mvebu-mbus.txt

This does however raise an interesting question about the size that we actually
want to map:

Your patch at the moment maps the entire register area that is listed for
a given CS, which simplifies the code, but it means that we can't easily
provide a smaller map in the mbus ranges when we know we don't need as
much address space. I had not considered that in my previous emails.

This is not a problem for the external bus because we can just map however
much space the attached device requires, but for SPI devices, the address
field in DT is just the CS number, not a range of addresses and the partitions
of e.g. an SPI NOR flash are then in a separate DT address space that does not
get translated into CPU addresses using 'ranges'.

This would be easier if have a conclusive proof that 1MB per CS always enough.
Is this something that is guaranteed in the SPI spec or the documentation for
this controller?

	Arnd
Mark Brown April 18, 2016, 11:04 a.m. UTC | #2
On Mon, Apr 18, 2016 at 12:51:55PM +0200, Arnd Bergmann wrote:

> This would be easier if have a conclusive proof that 1MB per CS always enough.
> Is this something that is guaranteed in the SPI spec or the documentation for
> this controller?

There's no spec for SPI but if there were it'd be hard to see it
imposing a limit, one can transfer data as long as the bus is clocked
(which some things like ADCs and DACs make use of).
Arnd Bergmann April 18, 2016, 11:32 a.m. UTC | #3
On Monday 18 April 2016 12:04:15 Mark Brown wrote:
> On Mon, Apr 18, 2016 at 12:51:55PM +0200, Arnd Bergmann wrote:
> 
> > This would be easier if have a conclusive proof that 1MB per CS always enough.
> > Is this something that is guaranteed in the SPI spec or the documentation for
> > this controller?
> 
> There's no spec for SPI but if there were it'd be hard to see it
> imposing a limit, one can transfer data as long as the bus is clocked
> (which some things like ADCs and DACs make use of).
> 

I just reread Stefan's patch and realized that I had fundamentally
misunderstood how the transfer is done: I thought the offset inside of
the window was used to address a NOR flash directly, but it seems
it is just used to send arbitrarily long commands.

This means that the 1MB window is probably a reasonable size (provided
that the (most importantly) the SPI-NOR driver can guarantee that it
never exceeds this).

It also means that we are probably better off using the single-window mode
of the SPI controller, where all CS lines share a single mbus window.
The only real difference here is that we can map all endpoints using a
single contiguous window into the CPU address space if we want, or we
can still map them separately on a given board if that results in a nicer
layout.

	Arnd
Stefan Roese April 18, 2016, 1 p.m. UTC | #4
On 18.04.2016 13:32, Arnd Bergmann wrote:
> On Monday 18 April 2016 12:04:15 Mark Brown wrote:
>> On Mon, Apr 18, 2016 at 12:51:55PM +0200, Arnd Bergmann wrote:
>>
>>> This would be easier if have a conclusive proof that 1MB per CS always enough.
>>> Is this something that is guaranteed in the SPI spec or the documentation for
>>> this controller?
>>
>> There's no spec for SPI but if there were it'd be hard to see it
>> imposing a limit, one can transfer data as long as the bus is clocked
>> (which some things like ADCs and DACs make use of).
>>
>
> I just reread Stefan's patch and realized that I had fundamentally
> misunderstood how the transfer is done: I thought the offset inside of
> the window was used to address a NOR flash directly, but it seems
> it is just used to send arbitrarily long commands.

Yes. SPI NOR flash support definitely needs some additional patches.
To support the address and commands being inserted in the SPI
messages via the corresponding registers. Again, this patch is
only intended and tested for very "simple" TX messages without any
addresses being inserted at all.

> This means that the 1MB window is probably a reasonable size (provided
> that the (most importantly) the SPI-NOR driver can guarantee that it
> never exceeds this).

I have no problem switching from 1MiB to using 0xffffffff in the
SPI controller 'reg' node to allow even bigger windows in v6.

> It also means that we are probably better off using the single-window mode
> of the SPI controller, where all CS lines share a single mbus window.
> The only real difference here is that we can map all endpoints using a
> single contiguous window into the CPU address space if we want, or we
> can still map them separately on a given board if that results in a nicer
> layout.

Frankly, I've just read about this single-window mode for the first
time. What I miss here in this mode though, is the configuration of
the SPI device (chip-select 0...7) for this direct-mode in the SPI
driver. With the currently implemented separate window method, your
suggested method is easy: Either is window can be mapped (direct-mode)
or not (indirect-mode). With this single-window mode I'm missing the
per-device configuration for direct-mode. Or what is your idea on
how to configure this access-mode in this case?

Thanks,
Stefan
Arnd Bergmann April 18, 2016, 1:57 p.m. UTC | #5
On Monday 18 April 2016 15:00:52 Stefan Roese wrote:
> On 18.04.2016 13:32, Arnd Bergmann wrote:
> > On Monday 18 April 2016 12:04:15 Mark Brown wrote:
> >> On Mon, Apr 18, 2016 at 12:51:55PM +0200, Arnd Bergmann wrote:
> >>
> >>> This would be easier if have a conclusive proof that 1MB per CS always enough.
> >>> Is this something that is guaranteed in the SPI spec or the documentation for
> >>> this controller?
> >>
> >> There's no spec for SPI but if there were it'd be hard to see it
> >> imposing a limit, one can transfer data as long as the bus is clocked
> >> (which some things like ADCs and DACs make use of).
> >>
> >
> > I just reread Stefan's patch and realized that I had fundamentally
> > misunderstood how the transfer is done: I thought the offset inside of
> > the window was used to address a NOR flash directly, but it seems
> > it is just used to send arbitrarily long commands.
> 
> Yes. SPI NOR flash support definitely needs some additional patches.
> To support the address and commands being inserted in the SPI
> messages via the corresponding registers. Again, this patch is
> only intended and tested for very "simple" TX messages without any
> addresses being inserted at all.

Ok, so I looked up the datasheet again and understood now that it
actually does both ways: a) the direct read/write that I had always assumed
it did with automatic cmd/addr/data generation, and b) the mode where
you do each write transfer separately as implemented by your patch.

Those two modes seem to be wildly different in their needs for address
space:

- For a) we definitely need the mapping to be the same size as the
  addressable storage of the SPI-NOR (or other device that fits
  into the cmd/addr/data pattern supported by the hardware)

- For b) the address is ignored and at most 32 bytes are transfered,
  so whatever the smallest MBUS mapping window size is would certainly
  be enough.

As a side-note, I see that you use a regular devm_ioremap_resource(),
which I assume prevents the transfers from ever being more than
a single 4-byte word, while burst mode with up to 32 bytes likely
requires a write-combining mapping.

> > This means that the 1MB window is probably a reasonable size (provided
> > that the (most importantly) the SPI-NOR driver can guarantee that it
> > never exceeds this).
> 
> I have no problem switching from 1MiB to using 0xffffffff in the
> SPI controller 'reg' node to allow even bigger windows in v6.

How do you decide how much of the window to map though? Using 0xffffffff
is nice because it better represents what the hardware looks like
and doesn't limit you from having arbitrarily large mbus windows
based on your needs, but you'd probably have to try mapping various
sizes then to find the maximum supported one, or you'd have to do
some ugly parsing of the parent node ranges.

> > It also means that we are probably better off using the single-window mode
> > of the SPI controller, where all CS lines share a single mbus window.
> > The only real difference here is that we can map all endpoints using a
> > single contiguous window into the CPU address space if we want, or we
> > can still map them separately on a given board if that results in a nicer
> > layout.
> 
> Frankly, I've just read about this single-window mode for the first
> time. What I miss here in this mode though, is the configuration of
> the SPI device (chip-select 0...7) for this direct-mode in the SPI
> driver. With the currently implemented separate window method, your
> suggested method is easy: Either is window can be mapped (direct-mode)
> or not (indirect-mode). With this single-window mode I'm missing the
> per-device configuration for direct-mode. Or what is your idea on
> how to configure this access-mode in this case?

It would work the same way, the main difference is that for single-window
mode all the devices get the same amount of address MBUS address space,
e.g. 1MB per CS in a contiguous MBUS range, but you can still map
subsections of it into the CPU, e.g.


&mbus {
	ranges = <MBUS_ID(0xf0, 0x01) 0 0 0xf1000000 0x100000>,   /* internal regs */
		 <MBUS_ID(0x01, 0x1e) 0x300000 0 0xf1100000 0x200000>, /* CS3+CS4 */
		 <MBUS_ID(0x01, 0x1d) 0x600000 0 0xf1300000 0x80000>, /* bootrom */
		 <MBUS_ID(0x01, 0x1e) 0x600000 0 0xf1380000 0x10000>; /* CS6 */

	spi@f0,01,10600 {
                reg = <MBUS_ID(0xf0, 0x01) 0x10600 0x28>,       /* control */
                      <MBUS_ID(0x01, 0x1e) 0x000000 0x100000>, /* CS0 */
                      <MBUS_ID(0x01, 0x1e) 0x100000 0x100000>, /* CS1 */
                      <MBUS_ID(0x01, 0x1e) 0x200000 0x100000>, /* CS2 */
                      <MBUS_ID(0x01, 0x1e) 0x300000 0x100000>, /* CS3 */
                      <MBUS_ID(0x01, 0x1e) 0x400000 0x100000>, /* CS4 */
                      <MBUS_ID(0x01, 0x1e) 0x500000 0x100000>, /* CS5 */
                      <MBUS_ID(0x01, 0x1e) 0x600000 0x100000>, /* CS6 */
                      <MBUS_ID(0x01, 0x1e) 0x700000 0x100000>; /* CS7 */
	};
};

In this example, you have to configure the SPI controller to have 1MB per CS
in single-window mode using the "SPI DecodeMode" and "SPI CS Adress Decode"
registers. The mbus then contains two mappings, one 2MB window spanning
CS3 and CS4 in a continuous CPU range, and one 64K window for a device at
CS6, making it as small as possible to conserve CPU address space.

	Arnd
Stefan Roese April 19, 2016, 9:42 a.m. UTC | #6
On 18.04.2016 15:57, Arnd Bergmann wrote:
> On Monday 18 April 2016 15:00:52 Stefan Roese wrote:
>> On 18.04.2016 13:32, Arnd Bergmann wrote:
>>> On Monday 18 April 2016 12:04:15 Mark Brown wrote:
>>>> On Mon, Apr 18, 2016 at 12:51:55PM +0200, Arnd Bergmann wrote:
>>>>
>>>>> This would be easier if have a conclusive proof that 1MB per CS always enough.
>>>>> Is this something that is guaranteed in the SPI spec or the documentation for
>>>>> this controller?
>>>>
>>>> There's no spec for SPI but if there were it'd be hard to see it
>>>> imposing a limit, one can transfer data as long as the bus is clocked
>>>> (which some things like ADCs and DACs make use of).
>>>>
>>>
>>> I just reread Stefan's patch and realized that I had fundamentally
>>> misunderstood how the transfer is done: I thought the offset inside of
>>> the window was used to address a NOR flash directly, but it seems
>>> it is just used to send arbitrarily long commands.
>>
>> Yes. SPI NOR flash support definitely needs some additional patches.
>> To support the address and commands being inserted in the SPI
>> messages via the corresponding registers. Again, this patch is
>> only intended and tested for very "simple" TX messages without any
>> addresses being inserted at all.
> 
> Ok, so I looked up the datasheet again and understood now that it
> actually does both ways: a) the direct read/write that I had always assumed
> it did with automatic cmd/addr/data generation, and b) the mode where
> you do each write transfer separately as implemented by your patch.
> 
> Those two modes seem to be wildly different in their needs for address
> space:
> 
> - For a) we definitely need the mapping to be the same size as the
>    addressable storage of the SPI-NOR (or other device that fits
>    into the cmd/addr/data pattern supported by the hardware)
> 
> - For b) the address is ignored and at most 32 bytes are transfered,
>    so whatever the smallest MBUS mapping window size is would certainly
>    be enough.
>
> As a side-note, I see that you use a regular devm_ioremap_resource(),
> which I assume prevents the transfers from ever being more than
> a single 4-byte word, while burst mode with up to 32 bytes likely
> requires a write-combining mapping.

Thanks for the hint. I've tested again with devm_ioremap_wc() and the
resulting SPI TX time is identical to the one using the driver with
the uncached (non write-combined) mapping. This is most likely a
result of the already fully saturated SPI bus speed being the
bottle-neck here.

A bit more testing has shown, that using devm_ioremap_wc() leads
to sporadic failures in the FPGA image downloading though. Now
I'm unsure, if its really worth to add some cache flushing after
the memcpy() call here (flush_cache_vmap ?) or better keep the
uncached ioremap in place.

>>> This means that the 1MB window is probably a reasonable size (provided
>>> that the (most importantly) the SPI-NOR driver can guarantee that it
>>> never exceeds this).
>>
>> I have no problem switching from 1MiB to using 0xffffffff in the
>> SPI controller 'reg' node to allow even bigger windows in v6.
> 
> How do you decide how much of the window to map though? Using 0xffffffff
> is nice because it better represents what the hardware looks like
> and doesn't limit you from having arbitrarily large mbus windows
> based on your needs, but you'd probably have to try mapping various
> sizes then to find the maximum supported one, or you'd have to do
> some ugly parsing of the parent node ranges.

This is for the single-window mode, right? My comment above was
still referring to the original implementation. And yes, this
gets a bit complicated and is most likely not really necessary.
So I would prefer to keep the mapping fixed to a max. of 1MiB for
now.
 
>>> It also means that we are probably better off using the single-window mode
>>> of the SPI controller, where all CS lines share a single mbus window.
>>> The only real difference here is that we can map all endpoints using a
>>> single contiguous window into the CPU address space if we want, or we
>>> can still map them separately on a given board if that results in a nicer
>>> layout.
>>
>> Frankly, I've just read about this single-window mode for the first
>> time. What I miss here in this mode though, is the configuration of
>> the SPI device (chip-select 0...7) for this direct-mode in the SPI
>> driver. With the currently implemented separate window method, your
>> suggested method is easy: Either is window can be mapped (direct-mode)
>> or not (indirect-mode). With this single-window mode I'm missing the
>> per-device configuration for direct-mode. Or what is your idea on
>> how to configure this access-mode in this case?
> 
> It would work the same way, the main difference is that for single-window
> mode all the devices get the same amount of address MBUS address space,
> e.g. 1MB per CS in a contiguous MBUS range, but you can still map
> subsections of it into the CPU, e.g.
> 
> 
> &mbus {
> 	ranges = <MBUS_ID(0xf0, 0x01) 0 0 0xf1000000 0x100000>,   /* internal regs */
> 		 <MBUS_ID(0x01, 0x1e) 0x300000 0 0xf1100000 0x200000>, /* CS3+CS4 */
> 		 <MBUS_ID(0x01, 0x1d) 0x600000 0 0xf1300000 0x80000>, /* bootrom */
> 		 <MBUS_ID(0x01, 0x1e) 0x600000 0 0xf1380000 0x10000>; /* CS6 */
> 
> 	spi@f0,01,10600 {
>                  reg = <MBUS_ID(0xf0, 0x01) 0x10600 0x28>,       /* control */
>                        <MBUS_ID(0x01, 0x1e) 0x000000 0x100000>, /* CS0 */
>                        <MBUS_ID(0x01, 0x1e) 0x100000 0x100000>, /* CS1 */
>                        <MBUS_ID(0x01, 0x1e) 0x200000 0x100000>, /* CS2 */
>                        <MBUS_ID(0x01, 0x1e) 0x300000 0x100000>, /* CS3 */
>                        <MBUS_ID(0x01, 0x1e) 0x400000 0x100000>, /* CS4 */
>                        <MBUS_ID(0x01, 0x1e) 0x500000 0x100000>, /* CS5 */
>                        <MBUS_ID(0x01, 0x1e) 0x600000 0x100000>, /* CS6 */
>                        <MBUS_ID(0x01, 0x1e) 0x700000 0x100000>; /* CS7 */
> 	};
> };
> 
> In this example, you have to configure the SPI controller to have 1MB per CS
> in single-window mode using the "SPI DecodeMode" and "SPI CS Adress Decode"
> registers. The mbus then contains two mappings, one 2MB window spanning
> CS3 and CS4 in a continuous CPU range, and one 64K window for a device at
> CS6, making it as small as possible to conserve CPU address space.

Thanks. I've prepared a new version implementing this single-
window mode. Please let me know how you prefer the cache issue
above to be handled (uncached mapping, flush or ?) and I'll
update the patch accordingly.

Thanks,
Stefan
Arnd Bergmann April 19, 2016, 1:41 p.m. UTC | #7
On Tuesday 19 April 2016 11:42:25 Stefan Roese wrote:
> On 18.04.2016 15:57, Arnd Bergmann wrote:
> > On Monday 18 April 2016 15:00:52 Stefan Roese wrote:
> >> On 18.04.2016 13:32, Arnd Bergmann wrote:
> > Ok, so I looked up the datasheet again and understood now that it
> > actually does both ways: a) the direct read/write that I had always assumed
> > it did with automatic cmd/addr/data generation, and b) the mode where
> > you do each write transfer separately as implemented by your patch.
> > 
> > Those two modes seem to be wildly different in their needs for address
> > space:
> > 
> > - For a) we definitely need the mapping to be the same size as the
> >    addressable storage of the SPI-NOR (or other device that fits
> >    into the cmd/addr/data pattern supported by the hardware)
> > 
> > - For b) the address is ignored and at most 32 bytes are transfered,
> >    so whatever the smallest MBUS mapping window size is would certainly
> >    be enough.
> >
> > As a side-note, I see that you use a regular devm_ioremap_resource(),
> > which I assume prevents the transfers from ever being more than
> > a single 4-byte word, while burst mode with up to 32 bytes likely
> > requires a write-combining mapping.
> 
> Thanks for the hint. I've tested again with devm_ioremap_wc() and the
> resulting SPI TX time is identical to the one using the driver with
> the uncached (non write-combined) mapping. This is most likely a
> result of the already fully saturated SPI bus speed being the
> bottle-neck here.

Ok

> A bit more testing has shown, that using devm_ioremap_wc() leads
> to sporadic failures in the FPGA image downloading though. Now
> I'm unsure, if its really worth to add some cache flushing after
> the memcpy() call here (flush_cache_vmap ?) or better keep the
> uncached ioremap in place.

The cache is not involved here, as devm_ioremap_wc is still uncached.
However it's possible that memcpy has other issues here. If you
don't gain anything from sending more than 32 bits at a time,
writesl() might be a better alternative, but it requires the data
to be 32-bit aligned in RAM. If the buffer is not aligned, the
memcpy() might already cause problems in case the kernel
implementation does not send the data in the correct order.

memcpy_toio() could also be helpful here.

> >>> This means that the 1MB window is probably a reasonable size (provided
> >>> that the (most importantly) the SPI-NOR driver can guarantee that it
> >>> never exceeds this).
> >>
> >> I have no problem switching from 1MiB to using 0xffffffff in the
> >> SPI controller 'reg' node to allow even bigger windows in v6.
> > 
> > How do you decide how much of the window to map though? Using 0xffffffff
> > is nice because it better represents what the hardware looks like
> > and doesn't limit you from having arbitrarily large mbus windows
> > based on your needs, but you'd probably have to try mapping various
> > sizes then to find the maximum supported one, or you'd have to do
> > some ugly parsing of the parent node ranges.
> 
> This is for the single-window mode, right? My comment above was
> still referring to the original implementation. And yes, this
> gets a bit complicated and is most likely not really necessary.
> So I would prefer to keep the mapping fixed to a max. of 1MiB for
> now.

No, I also meant the separate windows here. Based on the discussion
above, I think the driver can ideally just ioremap 4KB for devices
other than SPI-NOR, and then you can list the 0xffffffff length.

> > In this example, you have to configure the SPI controller to have 1MB per CS
> > in single-window mode using the "SPI DecodeMode" and "SPI CS Adress Decode"
> > registers. The mbus then contains two mappings, one 2MB window spanning
> > CS3 and CS4 in a continuous CPU range, and one 64K window for a device at
> > CS6, making it as small as possible to conserve CPU address space.
> 
> Thanks. I've prepared a new version implementing this single-
> window mode. Please let me know how you prefer the cache issue
> above to be handled (uncached mapping, flush or ?) and I'll
> update the patch accordingly.

If a write-combining mapping doesn't help you here, just use the normal
ioremap as before.

The single-window mode won't really work with SPI-NOR unless you make each
window large enough to hold the largest possible flash (128MB with
single-window, 256MB with double-window), but that size makes it harder
to gain much from saving mbus windows when you trade them for gigantic
CPU address space consumption.

Therefore, we probably want the separate-window mode to allow the
combination of large SPI flashes with other SPI devices in direct-access
mode. We can also implement the single (and/or double) window mode
in addition to that, but realistically we probably don't need that
as all machines we support in the kernel today have at most one
non-flash device.

	Arnd
Stefan Roese April 20, 2016, 7:38 a.m. UTC | #8
On 19.04.2016 15:41, Arnd Bergmann wrote:
> On Tuesday 19 April 2016 11:42:25 Stefan Roese wrote:
>> On 18.04.2016 15:57, Arnd Bergmann wrote:
>>> On Monday 18 April 2016 15:00:52 Stefan Roese wrote:
>>>> On 18.04.2016 13:32, Arnd Bergmann wrote:
>>> Ok, so I looked up the datasheet again and understood now that it
>>> actually does both ways: a) the direct read/write that I had always assumed
>>> it did with automatic cmd/addr/data generation, and b) the mode where
>>> you do each write transfer separately as implemented by your patch.
>>>
>>> Those two modes seem to be wildly different in their needs for address
>>> space:
>>>
>>> - For a) we definitely need the mapping to be the same size as the
>>>     addressable storage of the SPI-NOR (or other device that fits
>>>     into the cmd/addr/data pattern supported by the hardware)
>>>
>>> - For b) the address is ignored and at most 32 bytes are transfered,
>>>     so whatever the smallest MBUS mapping window size is would certainly
>>>     be enough.
>>>
>>> As a side-note, I see that you use a regular devm_ioremap_resource(),
>>> which I assume prevents the transfers from ever being more than
>>> a single 4-byte word, while burst mode with up to 32 bytes likely
>>> requires a write-combining mapping.
>>
>> Thanks for the hint. I've tested again with devm_ioremap_wc() and the
>> resulting SPI TX time is identical to the one using the driver with
>> the uncached (non write-combined) mapping. This is most likely a
>> result of the already fully saturated SPI bus speed being the
>> bottle-neck here.
>
> Ok
>
>> A bit more testing has shown, that using devm_ioremap_wc() leads
>> to sporadic failures in the FPGA image downloading though. Now
>> I'm unsure, if its really worth to add some cache flushing after
>> the memcpy() call here (flush_cache_vmap ?) or better keep the
>> uncached ioremap in place.
>
> The cache is not involved here, as devm_ioremap_wc is still uncached.
> However it's possible that memcpy has other issues here. If you
> don't gain anything from sending more than 32 bits at a time,
> writesl() might be a better alternative, but it requires the data
> to be 32-bit aligned in RAM. If the buffer is not aligned, the
> memcpy() might already cause problems in case the kernel
> implementation does not send the data in the correct order.
>
> memcpy_toio() could also be helpful here.
>
>>>>> This means that the 1MB window is probably a reasonable size (provided
>>>>> that the (most importantly) the SPI-NOR driver can guarantee that it
>>>>> never exceeds this).
>>>>
>>>> I have no problem switching from 1MiB to using 0xffffffff in the
>>>> SPI controller 'reg' node to allow even bigger windows in v6.
>>>
>>> How do you decide how much of the window to map though? Using 0xffffffff
>>> is nice because it better represents what the hardware looks like
>>> and doesn't limit you from having arbitrarily large mbus windows
>>> based on your needs, but you'd probably have to try mapping various
>>> sizes then to find the maximum supported one, or you'd have to do
>>> some ugly parsing of the parent node ranges.
>>
>> This is for the single-window mode, right? My comment above was
>> still referring to the original implementation. And yes, this
>> gets a bit complicated and is most likely not really necessary.
>> So I would prefer to keep the mapping fixed to a max. of 1MiB for
>> now.
>
> No, I also meant the separate windows here. Based on the discussion
> above, I think the driver can ideally just ioremap 4KB for devices
> other than SPI-NOR, and then you can list the 0xffffffff length.
>
>>> In this example, you have to configure the SPI controller to have 1MB per CS
>>> in single-window mode using the "SPI DecodeMode" and "SPI CS Adress Decode"
>>> registers. The mbus then contains two mappings, one 2MB window spanning
>>> CS3 and CS4 in a continuous CPU range, and one 64K window for a device at
>>> CS6, making it as small as possible to conserve CPU address space.
>>
>> Thanks. I've prepared a new version implementing this single-
>> window mode. Please let me know how you prefer the cache issue
>> above to be handled (uncached mapping, flush or ?) and I'll
>> update the patch accordingly.
>
> If a write-combining mapping doesn't help you here, just use the normal
> ioremap as before.

Okay, switching back to the normal ioremap.

> The single-window mode won't really work with SPI-NOR unless you make each
> window large enough to hold the largest possible flash (128MB with
> single-window, 256MB with double-window), but that size makes it harder
> to gain much from saving mbus windows when you trade them for gigantic
> CPU address space consumption.
>
> Therefore, we probably want the separate-window mode to allow the
> combination of large SPI flashes with other SPI devices in direct-access
> mode. We can also implement the single (and/or double) window mode
> in addition to that, but realistically we probably don't need that
> as all machines we support in the kernel today have at most one
> non-flash device.

Yes, this is also my feeling. While implementing this 0xffffffff
size in the SPI controller 'reg' DT node, I just stumbled over the
following problem: The resulting size of the area to map
(of_address_to_resource) is not the minimum of the 'soc' 'ranges'
entry and this 0xffffffff as I would have expected. But its always
0xffffffff. Do you know if this is general problem with this
approach or perhaps a problem / bug in the DT address probing /
translation?

Thanks,
Stefan
Arnd Bergmann April 20, 2016, 8:11 a.m. UTC | #9
On Wednesday 20 April 2016 09:38:29 Stefan Roese wrote:
> > The single-window mode won't really work with SPI-NOR unless you make each
> > window large enough to hold the largest possible flash (128MB with
> > single-window, 256MB with double-window), but that size makes it harder
> > to gain much from saving mbus windows when you trade them for gigantic
> > CPU address space consumption.
> >
> > Therefore, we probably want the separate-window mode to allow the
> > combination of large SPI flashes with other SPI devices in direct-access
> > mode. We can also implement the single (and/or double) window mode
> > in addition to that, but realistically we probably don't need that
> > as all machines we support in the kernel today have at most one
> > non-flash device.
> 
> Yes, this is also my feeling. While implementing this 0xffffffff
> size in the SPI controller 'reg' DT node, I just stumbled over the
> following problem: The resulting size of the area to map
> (of_address_to_resource) is not the minimum of the 'soc' 'ranges'
> entry and this 0xffffffff as I would have expected. But its always
> 0xffffffff. Do you know if this is general problem with this
> approach or perhaps a problem / bug in the DT address probing /
> translation?

That is the problem I was trying to explain a few mails back. The
creation of the platform resources does not look at what size can
be translated. I had not considered the possibility that this could
be changed (which would make things work here), but there are also
reasons for not changing it: if we make the resource smaller (possibly
even changing the start rather than the end), that may be even
more unexpected to other driver writers when the parent ranges 
are smaller than the device addresses.

However, I think there is a simpler solution here: for normal
devices, just always map just a single page using ioremap,
independent of what size the mbus mapping is. If you change
from memcpy() to writesl(), all writes will go to address zero
anyway, and we are already guaranteed (by not using ioremap_wc)
that there won't be any transfers that span more than four bytes
at a time (8 bytes once we use the driver on 64-bit machines
with writesq()).

For the direct spi-nor or spi-nand modes, we need extra code
in the spi controller driver, and that code will then also have
to ioremap_wc() the entire device. That ioremap should fail
if the ranges property lists a smaller area than the entire
device and it can fall back to the mode of using the 4k page
map.

	Arnd
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/spi/spi-orion.txt b/Documentation/devicetree/bindings/spi/spi-orion.txt
index 98bc698..ee9b064 100644
--- a/Documentation/devicetree/bindings/spi/spi-orion.txt
+++ b/Documentation/devicetree/bindings/spi/spi-orion.txt
@@ -8,7 +8,13 @@  Required properties:
     - "marvell,armada-380-spi", for the Armada 38x SoCs
     - "marvell,armada-390-spi", for the Armada 39x SoCs
     - "marvell,armada-xp-spi", for the Armada XP SoCs
-- reg : offset and length of the register set for the device
+- reg : offset and length of the register set for the device.
+	This property can optionally have additional entries to configure
+	the SPI direct access mode that some of the Marvell SoCs support
+	additionally to the normal indirect access (PIO) mode. The values
+	for the MBus "target" and "attribute" are defined in the Marvell
+	SoC "Functional Specifications" Manual in the chapter "Marvell
+	Core Processor Address Decoding".
 - cell-index : Which of multiple SPI controllers is this.
 Optional properties:
 - interrupts : Is currently not used.
@@ -23,3 +29,42 @@  Example:
 	       interrupts = <23>;
 	       status = "disabled";
        };
+
+Example with SPI direct mode support (optionally):
+	spi0: spi@10600 {
+		compatible = "marvell,orion-spi";
+		#address-cells = <1>;
+		#size-cells = <0>;
+		cell-index = <0>;
+		reg = <MBUS_ID(0xf0, 0x01) 0x10600 0x28	/* control */
+		       MBUS_ID(0x01, 0x1e) 0 0x100000	/* CS0 */
+		       MBUS_ID(0x01, 0x5e) 0 0x100000	/* CS1 */
+		       MBUS_ID(0x01, 0x9e) 0 0x100000	/* CS2 */
+		       MBUS_ID(0x01, 0xde) 0 0x100000	/* CS3 */
+		       MBUS_ID(0x01, 0x1f) 0 0x100000	/* CS4 */
+		       MBUS_ID(0x01, 0x5f) 0 0x100000	/* CS5 */
+		       MBUS_ID(0x01, 0x9f) 0 0x100000	/* CS6 */
+		       MBUS_ID(0x01, 0xdf) 0 0x100000>;	/* CS7 */
+		interrupts = <23>;
+		status = "disabled";
+	};
+
+To enable the direct mode, the board specific 'ranges' property in the
+'soc' node needs to add the entries for the desired SPI controllers
+and its chip-selects that are used in the direct mode instead of PIO
+mode. Here an example for this (SPI controller 0, device 1 and SPI
+controller 1, device 2 are used in direct mode. All other SPI device
+are used in the default indirect (PIO) mode):
+	soc {
+		/*
+		 * Enable the SPI direct access by configuring an entry
+		 * here in the board-specific ranges property
+		 */
+		ranges = <MBUS_ID(0xf0, 0x01) 0 0 0xf1000000 0x100000	/* internal regs */
+			  MBUS_ID(0x01, 0x1d) 0 0 0xfff00000 0x100000	/* BootROM       */
+			  MBUS_ID(0x01, 0x5e) 0 0 0xf1100000 0x100000	/* SPI0-DEV1     */
+			  MBUS_ID(0x01, 0x9a) 0 0 0xf1200000 0x100000>;	/* SPI1-DEV2     */
+
+For further information on the MBus bindings, please see the MBus
+DT documentation:
+Documentation/devicetree/bindings/bus/mvebu-mbus.txt
diff --git a/drivers/spi/spi-orion.c b/drivers/spi/spi-orion.c
index a87cfd4..63c4453 100644
--- a/drivers/spi/spi-orion.c
+++ b/drivers/spi/spi-orion.c
@@ -18,6 +18,7 @@ 
 #include <linux/module.h>
 #include <linux/pm_runtime.h>
 #include <linux/of.h>
+#include <linux/of_address.h>
 #include <linux/of_device.h>
 #include <linux/clk.h>
 #include <linux/sizes.h>
@@ -43,6 +44,9 @@ 
 #define ORION_SPI_INT_CAUSE_REG		0x10
 #define ORION_SPI_TIMING_PARAMS_REG	0x18
 
+/* Register for the "Direct Mode" */
+#define SPI_DIRECT_WRITE_CONFIG_REG	0x20
+
 #define ORION_SPI_TMISO_SAMPLE_MASK	(0x3 << 6)
 #define ORION_SPI_TMISO_SAMPLE_1	(1 << 6)
 #define ORION_SPI_TMISO_SAMPLE_2	(2 << 6)
@@ -78,11 +82,18 @@  struct orion_spi_dev {
 	bool			is_errata_50mhz_ac;
 };
 
+struct orion_direct_acc {
+	void __iomem		*vaddr;
+	u32			size;
+};
+
 struct orion_spi {
 	struct spi_master	*master;
 	void __iomem		*base;
 	struct clk              *clk;
 	const struct orion_spi_dev *devdata;
+
+	struct orion_direct_acc	direct_access[ORION_NUM_CHIPSELECTS];
 };
 
 static inline void __iomem *spi_reg(struct orion_spi *orion_spi, u32 reg)
@@ -367,15 +378,49 @@  orion_spi_write_read_16bit(struct spi_device *spi,
 	return 1;
 }
 
+static size_t orion_spi_max_transfer_size(struct spi_device *spi)
+{
+	struct orion_spi *orion_spi;
+	int cs = spi->chip_select;
+
+	orion_spi = spi_master_get_devdata(spi->master);
+
+	/* Return max size for direct mode */
+	if (orion_spi->direct_access[cs].vaddr)
+		return orion_spi->direct_access[cs].size;
+
+	/* Return the default max size if the direct mode is not used */
+	return SIZE_MAX;
+}
+
 static unsigned int
 orion_spi_write_read(struct spi_device *spi, struct spi_transfer *xfer)
 {
 	unsigned int count;
 	int word_len;
+	struct orion_spi *orion_spi;
+	int cs = spi->chip_select;
 
 	word_len = spi->bits_per_word;
 	count = xfer->len;
 
+	orion_spi = spi_master_get_devdata(spi->master);
+
+	/*
+	 * Use SPI direct write mode if base address is available. Otherwise
+	 * fall back to PIO mode for this transfer.
+	 */
+	if ((orion_spi->direct_access[cs].vaddr) && (xfer->tx_buf) &&
+	    (count <= orion_spi->direct_access[cs].size) && (word_len == 8)) {
+		/*
+		 * Send the tx-data to the SPI device via the direct
+		 * mapped address window
+		 */
+		memcpy(orion_spi->direct_access[cs].vaddr, xfer->tx_buf, count);
+
+		return count;
+	}
+
 	if (word_len == 8) {
 		const u8 *tx = xfer->tx_buf;
 		u8 *rx = xfer->rx_buf;
@@ -425,6 +470,10 @@  static int orion_spi_reset(struct orion_spi *orion_spi)
 {
 	/* Verify that the CS is deasserted */
 	orion_spi_clrbits(orion_spi, ORION_SPI_IF_CTRL_REG, 0x1);
+
+	/* Don't deassert CS between the direct mapped SPI transfers */
+	writel(0, spi_reg(orion_spi, SPI_DIRECT_WRITE_CONFIG_REG));
+
 	return 0;
 }
 
@@ -504,6 +553,7 @@  static int orion_spi_probe(struct platform_device *pdev)
 	struct resource *r;
 	unsigned long tclk_hz;
 	int status = 0;
+	struct device_node *np;
 
 	master = spi_alloc_master(&pdev->dev, sizeof(*spi));
 	if (master == NULL) {
@@ -529,6 +579,7 @@  static int orion_spi_probe(struct platform_device *pdev)
 	master->setup = orion_spi_setup;
 	master->bits_per_word_mask = SPI_BPW_MASK(8) | SPI_BPW_MASK(16);
 	master->auto_runtime_pm = true;
+	master->max_transfer_size = orion_spi_max_transfer_size;
 
 	platform_set_drvdata(pdev, master);
 
@@ -576,6 +627,42 @@  static int orion_spi_probe(struct platform_device *pdev)
 		goto out_rel_clk;
 	}
 
+	/* Scan all SPI devices of this controller for direct mapped devices */
+	for_each_available_child_of_node(pdev->dev.of_node, np) {
+		u32 cs;
+
+		/* Get chip-select number from the "reg" property */
+		status = of_property_read_u32(np, "reg", &cs);
+		if (status) {
+			dev_err(&pdev->dev,
+				"%s has no valid 'reg' property (%d)\n",
+				np->full_name, status);
+			status = 0;
+			continue;
+		}
+
+		/*
+		 * Check if an address is configured for this SPI device. If
+		 * not, the MBus mapping via the 'ranges' property in the 'soc'
+		 * node is not configured and this device should not use the
+		 * direct mode. In this case, just continue with the next
+		 * device.
+		 */
+		status = of_address_to_resource(pdev->dev.of_node, cs + 1, r);
+		if (status)
+			continue;
+
+		spi->direct_access[cs].vaddr = devm_ioremap_resource(&pdev->dev,
+								     r);
+		if (IS_ERR(spi->direct_access[cs].vaddr)) {
+			status = PTR_ERR(spi->direct_access[cs].vaddr);
+			goto out_rel_clk;
+		}
+
+		spi->direct_access[cs].size = resource_size(r);
+		dev_info(&pdev->dev, "CS%d configured for direct access\n", cs);
+	}
+
 	pm_runtime_set_active(&pdev->dev);
 	pm_runtime_use_autosuspend(&pdev->dev);
 	pm_runtime_set_autosuspend_delay(&pdev->dev, SPI_AUTOSUSPEND_TIMEOUT);