Message ID | 1460974417-32375-1-git-send-email-sr@denx.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Monday 18 April 2016 12:13:37 Stefan Roese wrote: > This patch adds support for the direct access mode to the Orion SPI > driver which is used on the Marvell Armada based SoCs. In this direct > mode, all data written to (or read from) a specifically mapped MBus > window (linked to one SPI chip-select on one of the SPI controllers) > will be transferred directly to the SPI bus. Without the need to control > the SPI registers in between. This can improve the SPI transfer rate in > such cases. > > Both, direct-read and -write mode are supported. But only the write > mode has been tested. This mode especially benefits from the SPI direct > mode, as the data bytes are written head-to-head to the SPI bus, > without any additional addresses. > > One use-case for this direct write mode is, programming a FPGA bitstream > image into the FPGA connected to the SPI bus at maximum speed. > > This mode is described in chapter "22.5.2 Direct Write to SPI" in the > Marvell Armada XP Functional Spec Datasheet. > > Signed-off-by: Stefan Roese <sr@denx.de> > Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> > Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> > Cc: Andrew Lunn <andrew@lunn.ch> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Mark Brown <broonie@kernel.org> > --- > v5: > - Added some documentation for the direct mode to the orion-spi DT > bindings document, including an example and a reference to the > mbus DT bindings documentation. Thanks, looks good overall. A couple of minor points here: > -- reg : offset and length of the register set for the device > +- reg : offset and length of the register set for the device. > + This property can optionally have additional entries to configure > + the SPI direct access mode that some of the Marvell SoCs support > + additionally to the normal indirect access (PIO) mode. The values > + for the MBus "target" and "attribute" are defined in the Marvell > + SoC "Functional Specifications" Manual in the chapter "Marvell > + Core Processor Address Decoding". While it might be obvious, I'd add something like The eight register sets following the control registers refer to chipselect lines 0 through 7 respectively. > - cell-index : Which of multiple SPI controllers is this. > Optional properties: > - interrupts : Is currently not used. > @@ -23,3 +29,42 @@ Example: > interrupts = <23>; > status = "disabled"; > }; > + > +Example with SPI direct mode support (optionally): > + spi0: spi@10600 { > + compatible = "marvell,orion-spi"; > + #address-cells = <1>; > + #size-cells = <0>; > + cell-index = <0>; > + reg = <MBUS_ID(0xf0, 0x01) 0x10600 0x28 /* control */ > + MBUS_ID(0x01, 0x1e) 0 0x100000 /* CS0 */ > + MBUS_ID(0x01, 0x5e) 0 0x100000 /* CS1 */ > + MBUS_ID(0x01, 0x9e) 0 0x100000 /* CS2 */ > + MBUS_ID(0x01, 0xde) 0 0x100000 /* CS3 */ > + MBUS_ID(0x01, 0x1f) 0 0x100000 /* CS4 */ > + MBUS_ID(0x01, 0x5f) 0 0x100000 /* CS5 */ > + MBUS_ID(0x01, 0x9f) 0 0x100000 /* CS6 */ > + MBUS_ID(0x01, 0xdf) 0 0x100000>; /* CS7 */ I prefer writing this as reg = <MBUS_ID(0xf0, 0x01) 0x10600 0x28>, /* control */ <MBUS_ID(0x01, 0x1e) 0 0x100000>, /* CS0 */ <MBUS_ID(0x01, 0x5e) 0 0x100000>, /* CS1 */ <MBUS_ID(0x01, 0x9e) 0 0x100000>, /* CS2 */ <MBUS_ID(0x01, 0xde) 0 0x100000>, /* CS3 */ <MBUS_ID(0x01, 0x1f) 0 0x100000>, /* CS4 */ <MBUS_ID(0x01, 0x5f) 0 0x100000>, /* CS5 */ <MBUS_ID(0x01, 0x9f) 0 0x100000>, /* CS6 */ <MBUS_ID(0x01, 0xdf) 0 0x100000>; /* CS7 */ Same for the ranges below. Are you sure about the 1MB length for each one? I don't remember seeing this limitation in the datasheet, so maybe the length should be specified as 0xffffffff (we can't use 0x10000000 unfortunately as that doesn't fit within a 32-bit cell), to make it possible to map larger register areas. > +To enable the direct mode, the board specific 'ranges' property in the > +'soc' node needs to add the entries for the desired SPI controllers > +and its chip-selects that are used in the direct mode instead of PIO > +mode. Here an example for this (SPI controller 0, device 1 and SPI > +controller 1, device 2 are used in direct mode. All other SPI device > +are used in the default indirect (PIO) mode): > + soc { > + /* > + * Enable the SPI direct access by configuring an entry > + * here in the board-specific ranges property > + */ > + ranges = <MBUS_ID(0xf0, 0x01) 0 0 0xf1000000 0x100000 /* internal regs */ > + MBUS_ID(0x01, 0x1d) 0 0 0xfff00000 0x100000 /* BootROM */ > + MBUS_ID(0x01, 0x5e) 0 0 0xf1100000 0x100000 /* SPI0-DEV1 */ > + MBUS_ID(0x01, 0x9a) 0 0 0xf1200000 0x100000>; /* SPI1-DEV2 */ > + > +For further information on the MBus bindings, please see the MBus > +DT documentation: > +Documentation/devicetree/bindings/bus/mvebu-mbus.txt This does however raise an interesting question about the size that we actually want to map: Your patch at the moment maps the entire register area that is listed for a given CS, which simplifies the code, but it means that we can't easily provide a smaller map in the mbus ranges when we know we don't need as much address space. I had not considered that in my previous emails. This is not a problem for the external bus because we can just map however much space the attached device requires, but for SPI devices, the address field in DT is just the CS number, not a range of addresses and the partitions of e.g. an SPI NOR flash are then in a separate DT address space that does not get translated into CPU addresses using 'ranges'. This would be easier if have a conclusive proof that 1MB per CS always enough. Is this something that is guaranteed in the SPI spec or the documentation for this controller? Arnd
On Mon, Apr 18, 2016 at 12:51:55PM +0200, Arnd Bergmann wrote: > This would be easier if have a conclusive proof that 1MB per CS always enough. > Is this something that is guaranteed in the SPI spec or the documentation for > this controller? There's no spec for SPI but if there were it'd be hard to see it imposing a limit, one can transfer data as long as the bus is clocked (which some things like ADCs and DACs make use of).
On Monday 18 April 2016 12:04:15 Mark Brown wrote: > On Mon, Apr 18, 2016 at 12:51:55PM +0200, Arnd Bergmann wrote: > > > This would be easier if have a conclusive proof that 1MB per CS always enough. > > Is this something that is guaranteed in the SPI spec or the documentation for > > this controller? > > There's no spec for SPI but if there were it'd be hard to see it > imposing a limit, one can transfer data as long as the bus is clocked > (which some things like ADCs and DACs make use of). > I just reread Stefan's patch and realized that I had fundamentally misunderstood how the transfer is done: I thought the offset inside of the window was used to address a NOR flash directly, but it seems it is just used to send arbitrarily long commands. This means that the 1MB window is probably a reasonable size (provided that the (most importantly) the SPI-NOR driver can guarantee that it never exceeds this). It also means that we are probably better off using the single-window mode of the SPI controller, where all CS lines share a single mbus window. The only real difference here is that we can map all endpoints using a single contiguous window into the CPU address space if we want, or we can still map them separately on a given board if that results in a nicer layout. Arnd
On 18.04.2016 13:32, Arnd Bergmann wrote: > On Monday 18 April 2016 12:04:15 Mark Brown wrote: >> On Mon, Apr 18, 2016 at 12:51:55PM +0200, Arnd Bergmann wrote: >> >>> This would be easier if have a conclusive proof that 1MB per CS always enough. >>> Is this something that is guaranteed in the SPI spec or the documentation for >>> this controller? >> >> There's no spec for SPI but if there were it'd be hard to see it >> imposing a limit, one can transfer data as long as the bus is clocked >> (which some things like ADCs and DACs make use of). >> > > I just reread Stefan's patch and realized that I had fundamentally > misunderstood how the transfer is done: I thought the offset inside of > the window was used to address a NOR flash directly, but it seems > it is just used to send arbitrarily long commands. Yes. SPI NOR flash support definitely needs some additional patches. To support the address and commands being inserted in the SPI messages via the corresponding registers. Again, this patch is only intended and tested for very "simple" TX messages without any addresses being inserted at all. > This means that the 1MB window is probably a reasonable size (provided > that the (most importantly) the SPI-NOR driver can guarantee that it > never exceeds this). I have no problem switching from 1MiB to using 0xffffffff in the SPI controller 'reg' node to allow even bigger windows in v6. > It also means that we are probably better off using the single-window mode > of the SPI controller, where all CS lines share a single mbus window. > The only real difference here is that we can map all endpoints using a > single contiguous window into the CPU address space if we want, or we > can still map them separately on a given board if that results in a nicer > layout. Frankly, I've just read about this single-window mode for the first time. What I miss here in this mode though, is the configuration of the SPI device (chip-select 0...7) for this direct-mode in the SPI driver. With the currently implemented separate window method, your suggested method is easy: Either is window can be mapped (direct-mode) or not (indirect-mode). With this single-window mode I'm missing the per-device configuration for direct-mode. Or what is your idea on how to configure this access-mode in this case? Thanks, Stefan
On Monday 18 April 2016 15:00:52 Stefan Roese wrote: > On 18.04.2016 13:32, Arnd Bergmann wrote: > > On Monday 18 April 2016 12:04:15 Mark Brown wrote: > >> On Mon, Apr 18, 2016 at 12:51:55PM +0200, Arnd Bergmann wrote: > >> > >>> This would be easier if have a conclusive proof that 1MB per CS always enough. > >>> Is this something that is guaranteed in the SPI spec or the documentation for > >>> this controller? > >> > >> There's no spec for SPI but if there were it'd be hard to see it > >> imposing a limit, one can transfer data as long as the bus is clocked > >> (which some things like ADCs and DACs make use of). > >> > > > > I just reread Stefan's patch and realized that I had fundamentally > > misunderstood how the transfer is done: I thought the offset inside of > > the window was used to address a NOR flash directly, but it seems > > it is just used to send arbitrarily long commands. > > Yes. SPI NOR flash support definitely needs some additional patches. > To support the address and commands being inserted in the SPI > messages via the corresponding registers. Again, this patch is > only intended and tested for very "simple" TX messages without any > addresses being inserted at all. Ok, so I looked up the datasheet again and understood now that it actually does both ways: a) the direct read/write that I had always assumed it did with automatic cmd/addr/data generation, and b) the mode where you do each write transfer separately as implemented by your patch. Those two modes seem to be wildly different in their needs for address space: - For a) we definitely need the mapping to be the same size as the addressable storage of the SPI-NOR (or other device that fits into the cmd/addr/data pattern supported by the hardware) - For b) the address is ignored and at most 32 bytes are transfered, so whatever the smallest MBUS mapping window size is would certainly be enough. As a side-note, I see that you use a regular devm_ioremap_resource(), which I assume prevents the transfers from ever being more than a single 4-byte word, while burst mode with up to 32 bytes likely requires a write-combining mapping. > > This means that the 1MB window is probably a reasonable size (provided > > that the (most importantly) the SPI-NOR driver can guarantee that it > > never exceeds this). > > I have no problem switching from 1MiB to using 0xffffffff in the > SPI controller 'reg' node to allow even bigger windows in v6. How do you decide how much of the window to map though? Using 0xffffffff is nice because it better represents what the hardware looks like and doesn't limit you from having arbitrarily large mbus windows based on your needs, but you'd probably have to try mapping various sizes then to find the maximum supported one, or you'd have to do some ugly parsing of the parent node ranges. > > It also means that we are probably better off using the single-window mode > > of the SPI controller, where all CS lines share a single mbus window. > > The only real difference here is that we can map all endpoints using a > > single contiguous window into the CPU address space if we want, or we > > can still map them separately on a given board if that results in a nicer > > layout. > > Frankly, I've just read about this single-window mode for the first > time. What I miss here in this mode though, is the configuration of > the SPI device (chip-select 0...7) for this direct-mode in the SPI > driver. With the currently implemented separate window method, your > suggested method is easy: Either is window can be mapped (direct-mode) > or not (indirect-mode). With this single-window mode I'm missing the > per-device configuration for direct-mode. Or what is your idea on > how to configure this access-mode in this case? It would work the same way, the main difference is that for single-window mode all the devices get the same amount of address MBUS address space, e.g. 1MB per CS in a contiguous MBUS range, but you can still map subsections of it into the CPU, e.g. &mbus { ranges = <MBUS_ID(0xf0, 0x01) 0 0 0xf1000000 0x100000>, /* internal regs */ <MBUS_ID(0x01, 0x1e) 0x300000 0 0xf1100000 0x200000>, /* CS3+CS4 */ <MBUS_ID(0x01, 0x1d) 0x600000 0 0xf1300000 0x80000>, /* bootrom */ <MBUS_ID(0x01, 0x1e) 0x600000 0 0xf1380000 0x10000>; /* CS6 */ spi@f0,01,10600 { reg = <MBUS_ID(0xf0, 0x01) 0x10600 0x28>, /* control */ <MBUS_ID(0x01, 0x1e) 0x000000 0x100000>, /* CS0 */ <MBUS_ID(0x01, 0x1e) 0x100000 0x100000>, /* CS1 */ <MBUS_ID(0x01, 0x1e) 0x200000 0x100000>, /* CS2 */ <MBUS_ID(0x01, 0x1e) 0x300000 0x100000>, /* CS3 */ <MBUS_ID(0x01, 0x1e) 0x400000 0x100000>, /* CS4 */ <MBUS_ID(0x01, 0x1e) 0x500000 0x100000>, /* CS5 */ <MBUS_ID(0x01, 0x1e) 0x600000 0x100000>, /* CS6 */ <MBUS_ID(0x01, 0x1e) 0x700000 0x100000>; /* CS7 */ }; }; In this example, you have to configure the SPI controller to have 1MB per CS in single-window mode using the "SPI DecodeMode" and "SPI CS Adress Decode" registers. The mbus then contains two mappings, one 2MB window spanning CS3 and CS4 in a continuous CPU range, and one 64K window for a device at CS6, making it as small as possible to conserve CPU address space. Arnd
On 18.04.2016 15:57, Arnd Bergmann wrote: > On Monday 18 April 2016 15:00:52 Stefan Roese wrote: >> On 18.04.2016 13:32, Arnd Bergmann wrote: >>> On Monday 18 April 2016 12:04:15 Mark Brown wrote: >>>> On Mon, Apr 18, 2016 at 12:51:55PM +0200, Arnd Bergmann wrote: >>>> >>>>> This would be easier if have a conclusive proof that 1MB per CS always enough. >>>>> Is this something that is guaranteed in the SPI spec or the documentation for >>>>> this controller? >>>> >>>> There's no spec for SPI but if there were it'd be hard to see it >>>> imposing a limit, one can transfer data as long as the bus is clocked >>>> (which some things like ADCs and DACs make use of). >>>> >>> >>> I just reread Stefan's patch and realized that I had fundamentally >>> misunderstood how the transfer is done: I thought the offset inside of >>> the window was used to address a NOR flash directly, but it seems >>> it is just used to send arbitrarily long commands. >> >> Yes. SPI NOR flash support definitely needs some additional patches. >> To support the address and commands being inserted in the SPI >> messages via the corresponding registers. Again, this patch is >> only intended and tested for very "simple" TX messages without any >> addresses being inserted at all. > > Ok, so I looked up the datasheet again and understood now that it > actually does both ways: a) the direct read/write that I had always assumed > it did with automatic cmd/addr/data generation, and b) the mode where > you do each write transfer separately as implemented by your patch. > > Those two modes seem to be wildly different in their needs for address > space: > > - For a) we definitely need the mapping to be the same size as the > addressable storage of the SPI-NOR (or other device that fits > into the cmd/addr/data pattern supported by the hardware) > > - For b) the address is ignored and at most 32 bytes are transfered, > so whatever the smallest MBUS mapping window size is would certainly > be enough. > > As a side-note, I see that you use a regular devm_ioremap_resource(), > which I assume prevents the transfers from ever being more than > a single 4-byte word, while burst mode with up to 32 bytes likely > requires a write-combining mapping. Thanks for the hint. I've tested again with devm_ioremap_wc() and the resulting SPI TX time is identical to the one using the driver with the uncached (non write-combined) mapping. This is most likely a result of the already fully saturated SPI bus speed being the bottle-neck here. A bit more testing has shown, that using devm_ioremap_wc() leads to sporadic failures in the FPGA image downloading though. Now I'm unsure, if its really worth to add some cache flushing after the memcpy() call here (flush_cache_vmap ?) or better keep the uncached ioremap in place. >>> This means that the 1MB window is probably a reasonable size (provided >>> that the (most importantly) the SPI-NOR driver can guarantee that it >>> never exceeds this). >> >> I have no problem switching from 1MiB to using 0xffffffff in the >> SPI controller 'reg' node to allow even bigger windows in v6. > > How do you decide how much of the window to map though? Using 0xffffffff > is nice because it better represents what the hardware looks like > and doesn't limit you from having arbitrarily large mbus windows > based on your needs, but you'd probably have to try mapping various > sizes then to find the maximum supported one, or you'd have to do > some ugly parsing of the parent node ranges. This is for the single-window mode, right? My comment above was still referring to the original implementation. And yes, this gets a bit complicated and is most likely not really necessary. So I would prefer to keep the mapping fixed to a max. of 1MiB for now. >>> It also means that we are probably better off using the single-window mode >>> of the SPI controller, where all CS lines share a single mbus window. >>> The only real difference here is that we can map all endpoints using a >>> single contiguous window into the CPU address space if we want, or we >>> can still map them separately on a given board if that results in a nicer >>> layout. >> >> Frankly, I've just read about this single-window mode for the first >> time. What I miss here in this mode though, is the configuration of >> the SPI device (chip-select 0...7) for this direct-mode in the SPI >> driver. With the currently implemented separate window method, your >> suggested method is easy: Either is window can be mapped (direct-mode) >> or not (indirect-mode). With this single-window mode I'm missing the >> per-device configuration for direct-mode. Or what is your idea on >> how to configure this access-mode in this case? > > It would work the same way, the main difference is that for single-window > mode all the devices get the same amount of address MBUS address space, > e.g. 1MB per CS in a contiguous MBUS range, but you can still map > subsections of it into the CPU, e.g. > > > &mbus { > ranges = <MBUS_ID(0xf0, 0x01) 0 0 0xf1000000 0x100000>, /* internal regs */ > <MBUS_ID(0x01, 0x1e) 0x300000 0 0xf1100000 0x200000>, /* CS3+CS4 */ > <MBUS_ID(0x01, 0x1d) 0x600000 0 0xf1300000 0x80000>, /* bootrom */ > <MBUS_ID(0x01, 0x1e) 0x600000 0 0xf1380000 0x10000>; /* CS6 */ > > spi@f0,01,10600 { > reg = <MBUS_ID(0xf0, 0x01) 0x10600 0x28>, /* control */ > <MBUS_ID(0x01, 0x1e) 0x000000 0x100000>, /* CS0 */ > <MBUS_ID(0x01, 0x1e) 0x100000 0x100000>, /* CS1 */ > <MBUS_ID(0x01, 0x1e) 0x200000 0x100000>, /* CS2 */ > <MBUS_ID(0x01, 0x1e) 0x300000 0x100000>, /* CS3 */ > <MBUS_ID(0x01, 0x1e) 0x400000 0x100000>, /* CS4 */ > <MBUS_ID(0x01, 0x1e) 0x500000 0x100000>, /* CS5 */ > <MBUS_ID(0x01, 0x1e) 0x600000 0x100000>, /* CS6 */ > <MBUS_ID(0x01, 0x1e) 0x700000 0x100000>; /* CS7 */ > }; > }; > > In this example, you have to configure the SPI controller to have 1MB per CS > in single-window mode using the "SPI DecodeMode" and "SPI CS Adress Decode" > registers. The mbus then contains two mappings, one 2MB window spanning > CS3 and CS4 in a continuous CPU range, and one 64K window for a device at > CS6, making it as small as possible to conserve CPU address space. Thanks. I've prepared a new version implementing this single- window mode. Please let me know how you prefer the cache issue above to be handled (uncached mapping, flush or ?) and I'll update the patch accordingly. Thanks, Stefan
On Tuesday 19 April 2016 11:42:25 Stefan Roese wrote: > On 18.04.2016 15:57, Arnd Bergmann wrote: > > On Monday 18 April 2016 15:00:52 Stefan Roese wrote: > >> On 18.04.2016 13:32, Arnd Bergmann wrote: > > Ok, so I looked up the datasheet again and understood now that it > > actually does both ways: a) the direct read/write that I had always assumed > > it did with automatic cmd/addr/data generation, and b) the mode where > > you do each write transfer separately as implemented by your patch. > > > > Those two modes seem to be wildly different in their needs for address > > space: > > > > - For a) we definitely need the mapping to be the same size as the > > addressable storage of the SPI-NOR (or other device that fits > > into the cmd/addr/data pattern supported by the hardware) > > > > - For b) the address is ignored and at most 32 bytes are transfered, > > so whatever the smallest MBUS mapping window size is would certainly > > be enough. > > > > As a side-note, I see that you use a regular devm_ioremap_resource(), > > which I assume prevents the transfers from ever being more than > > a single 4-byte word, while burst mode with up to 32 bytes likely > > requires a write-combining mapping. > > Thanks for the hint. I've tested again with devm_ioremap_wc() and the > resulting SPI TX time is identical to the one using the driver with > the uncached (non write-combined) mapping. This is most likely a > result of the already fully saturated SPI bus speed being the > bottle-neck here. Ok > A bit more testing has shown, that using devm_ioremap_wc() leads > to sporadic failures in the FPGA image downloading though. Now > I'm unsure, if its really worth to add some cache flushing after > the memcpy() call here (flush_cache_vmap ?) or better keep the > uncached ioremap in place. The cache is not involved here, as devm_ioremap_wc is still uncached. However it's possible that memcpy has other issues here. If you don't gain anything from sending more than 32 bits at a time, writesl() might be a better alternative, but it requires the data to be 32-bit aligned in RAM. If the buffer is not aligned, the memcpy() might already cause problems in case the kernel implementation does not send the data in the correct order. memcpy_toio() could also be helpful here. > >>> This means that the 1MB window is probably a reasonable size (provided > >>> that the (most importantly) the SPI-NOR driver can guarantee that it > >>> never exceeds this). > >> > >> I have no problem switching from 1MiB to using 0xffffffff in the > >> SPI controller 'reg' node to allow even bigger windows in v6. > > > > How do you decide how much of the window to map though? Using 0xffffffff > > is nice because it better represents what the hardware looks like > > and doesn't limit you from having arbitrarily large mbus windows > > based on your needs, but you'd probably have to try mapping various > > sizes then to find the maximum supported one, or you'd have to do > > some ugly parsing of the parent node ranges. > > This is for the single-window mode, right? My comment above was > still referring to the original implementation. And yes, this > gets a bit complicated and is most likely not really necessary. > So I would prefer to keep the mapping fixed to a max. of 1MiB for > now. No, I also meant the separate windows here. Based on the discussion above, I think the driver can ideally just ioremap 4KB for devices other than SPI-NOR, and then you can list the 0xffffffff length. > > In this example, you have to configure the SPI controller to have 1MB per CS > > in single-window mode using the "SPI DecodeMode" and "SPI CS Adress Decode" > > registers. The mbus then contains two mappings, one 2MB window spanning > > CS3 and CS4 in a continuous CPU range, and one 64K window for a device at > > CS6, making it as small as possible to conserve CPU address space. > > Thanks. I've prepared a new version implementing this single- > window mode. Please let me know how you prefer the cache issue > above to be handled (uncached mapping, flush or ?) and I'll > update the patch accordingly. If a write-combining mapping doesn't help you here, just use the normal ioremap as before. The single-window mode won't really work with SPI-NOR unless you make each window large enough to hold the largest possible flash (128MB with single-window, 256MB with double-window), but that size makes it harder to gain much from saving mbus windows when you trade them for gigantic CPU address space consumption. Therefore, we probably want the separate-window mode to allow the combination of large SPI flashes with other SPI devices in direct-access mode. We can also implement the single (and/or double) window mode in addition to that, but realistically we probably don't need that as all machines we support in the kernel today have at most one non-flash device. Arnd
On 19.04.2016 15:41, Arnd Bergmann wrote: > On Tuesday 19 April 2016 11:42:25 Stefan Roese wrote: >> On 18.04.2016 15:57, Arnd Bergmann wrote: >>> On Monday 18 April 2016 15:00:52 Stefan Roese wrote: >>>> On 18.04.2016 13:32, Arnd Bergmann wrote: >>> Ok, so I looked up the datasheet again and understood now that it >>> actually does both ways: a) the direct read/write that I had always assumed >>> it did with automatic cmd/addr/data generation, and b) the mode where >>> you do each write transfer separately as implemented by your patch. >>> >>> Those two modes seem to be wildly different in their needs for address >>> space: >>> >>> - For a) we definitely need the mapping to be the same size as the >>> addressable storage of the SPI-NOR (or other device that fits >>> into the cmd/addr/data pattern supported by the hardware) >>> >>> - For b) the address is ignored and at most 32 bytes are transfered, >>> so whatever the smallest MBUS mapping window size is would certainly >>> be enough. >>> >>> As a side-note, I see that you use a regular devm_ioremap_resource(), >>> which I assume prevents the transfers from ever being more than >>> a single 4-byte word, while burst mode with up to 32 bytes likely >>> requires a write-combining mapping. >> >> Thanks for the hint. I've tested again with devm_ioremap_wc() and the >> resulting SPI TX time is identical to the one using the driver with >> the uncached (non write-combined) mapping. This is most likely a >> result of the already fully saturated SPI bus speed being the >> bottle-neck here. > > Ok > >> A bit more testing has shown, that using devm_ioremap_wc() leads >> to sporadic failures in the FPGA image downloading though. Now >> I'm unsure, if its really worth to add some cache flushing after >> the memcpy() call here (flush_cache_vmap ?) or better keep the >> uncached ioremap in place. > > The cache is not involved here, as devm_ioremap_wc is still uncached. > However it's possible that memcpy has other issues here. If you > don't gain anything from sending more than 32 bits at a time, > writesl() might be a better alternative, but it requires the data > to be 32-bit aligned in RAM. If the buffer is not aligned, the > memcpy() might already cause problems in case the kernel > implementation does not send the data in the correct order. > > memcpy_toio() could also be helpful here. > >>>>> This means that the 1MB window is probably a reasonable size (provided >>>>> that the (most importantly) the SPI-NOR driver can guarantee that it >>>>> never exceeds this). >>>> >>>> I have no problem switching from 1MiB to using 0xffffffff in the >>>> SPI controller 'reg' node to allow even bigger windows in v6. >>> >>> How do you decide how much of the window to map though? Using 0xffffffff >>> is nice because it better represents what the hardware looks like >>> and doesn't limit you from having arbitrarily large mbus windows >>> based on your needs, but you'd probably have to try mapping various >>> sizes then to find the maximum supported one, or you'd have to do >>> some ugly parsing of the parent node ranges. >> >> This is for the single-window mode, right? My comment above was >> still referring to the original implementation. And yes, this >> gets a bit complicated and is most likely not really necessary. >> So I would prefer to keep the mapping fixed to a max. of 1MiB for >> now. > > No, I also meant the separate windows here. Based on the discussion > above, I think the driver can ideally just ioremap 4KB for devices > other than SPI-NOR, and then you can list the 0xffffffff length. > >>> In this example, you have to configure the SPI controller to have 1MB per CS >>> in single-window mode using the "SPI DecodeMode" and "SPI CS Adress Decode" >>> registers. The mbus then contains two mappings, one 2MB window spanning >>> CS3 and CS4 in a continuous CPU range, and one 64K window for a device at >>> CS6, making it as small as possible to conserve CPU address space. >> >> Thanks. I've prepared a new version implementing this single- >> window mode. Please let me know how you prefer the cache issue >> above to be handled (uncached mapping, flush or ?) and I'll >> update the patch accordingly. > > If a write-combining mapping doesn't help you here, just use the normal > ioremap as before. Okay, switching back to the normal ioremap. > The single-window mode won't really work with SPI-NOR unless you make each > window large enough to hold the largest possible flash (128MB with > single-window, 256MB with double-window), but that size makes it harder > to gain much from saving mbus windows when you trade them for gigantic > CPU address space consumption. > > Therefore, we probably want the separate-window mode to allow the > combination of large SPI flashes with other SPI devices in direct-access > mode. We can also implement the single (and/or double) window mode > in addition to that, but realistically we probably don't need that > as all machines we support in the kernel today have at most one > non-flash device. Yes, this is also my feeling. While implementing this 0xffffffff size in the SPI controller 'reg' DT node, I just stumbled over the following problem: The resulting size of the area to map (of_address_to_resource) is not the minimum of the 'soc' 'ranges' entry and this 0xffffffff as I would have expected. But its always 0xffffffff. Do you know if this is general problem with this approach or perhaps a problem / bug in the DT address probing / translation? Thanks, Stefan
On Wednesday 20 April 2016 09:38:29 Stefan Roese wrote: > > The single-window mode won't really work with SPI-NOR unless you make each > > window large enough to hold the largest possible flash (128MB with > > single-window, 256MB with double-window), but that size makes it harder > > to gain much from saving mbus windows when you trade them for gigantic > > CPU address space consumption. > > > > Therefore, we probably want the separate-window mode to allow the > > combination of large SPI flashes with other SPI devices in direct-access > > mode. We can also implement the single (and/or double) window mode > > in addition to that, but realistically we probably don't need that > > as all machines we support in the kernel today have at most one > > non-flash device. > > Yes, this is also my feeling. While implementing this 0xffffffff > size in the SPI controller 'reg' DT node, I just stumbled over the > following problem: The resulting size of the area to map > (of_address_to_resource) is not the minimum of the 'soc' 'ranges' > entry and this 0xffffffff as I would have expected. But its always > 0xffffffff. Do you know if this is general problem with this > approach or perhaps a problem / bug in the DT address probing / > translation? That is the problem I was trying to explain a few mails back. The creation of the platform resources does not look at what size can be translated. I had not considered the possibility that this could be changed (which would make things work here), but there are also reasons for not changing it: if we make the resource smaller (possibly even changing the start rather than the end), that may be even more unexpected to other driver writers when the parent ranges are smaller than the device addresses. However, I think there is a simpler solution here: for normal devices, just always map just a single page using ioremap, independent of what size the mbus mapping is. If you change from memcpy() to writesl(), all writes will go to address zero anyway, and we are already guaranteed (by not using ioremap_wc) that there won't be any transfers that span more than four bytes at a time (8 bytes once we use the driver on 64-bit machines with writesq()). For the direct spi-nor or spi-nand modes, we need extra code in the spi controller driver, and that code will then also have to ioremap_wc() the entire device. That ioremap should fail if the ranges property lists a smaller area than the entire device and it can fall back to the mode of using the 4k page map. Arnd
diff --git a/Documentation/devicetree/bindings/spi/spi-orion.txt b/Documentation/devicetree/bindings/spi/spi-orion.txt index 98bc698..ee9b064 100644 --- a/Documentation/devicetree/bindings/spi/spi-orion.txt +++ b/Documentation/devicetree/bindings/spi/spi-orion.txt @@ -8,7 +8,13 @@ Required properties: - "marvell,armada-380-spi", for the Armada 38x SoCs - "marvell,armada-390-spi", for the Armada 39x SoCs - "marvell,armada-xp-spi", for the Armada XP SoCs -- reg : offset and length of the register set for the device +- reg : offset and length of the register set for the device. + This property can optionally have additional entries to configure + the SPI direct access mode that some of the Marvell SoCs support + additionally to the normal indirect access (PIO) mode. The values + for the MBus "target" and "attribute" are defined in the Marvell + SoC "Functional Specifications" Manual in the chapter "Marvell + Core Processor Address Decoding". - cell-index : Which of multiple SPI controllers is this. Optional properties: - interrupts : Is currently not used. @@ -23,3 +29,42 @@ Example: interrupts = <23>; status = "disabled"; }; + +Example with SPI direct mode support (optionally): + spi0: spi@10600 { + compatible = "marvell,orion-spi"; + #address-cells = <1>; + #size-cells = <0>; + cell-index = <0>; + reg = <MBUS_ID(0xf0, 0x01) 0x10600 0x28 /* control */ + MBUS_ID(0x01, 0x1e) 0 0x100000 /* CS0 */ + MBUS_ID(0x01, 0x5e) 0 0x100000 /* CS1 */ + MBUS_ID(0x01, 0x9e) 0 0x100000 /* CS2 */ + MBUS_ID(0x01, 0xde) 0 0x100000 /* CS3 */ + MBUS_ID(0x01, 0x1f) 0 0x100000 /* CS4 */ + MBUS_ID(0x01, 0x5f) 0 0x100000 /* CS5 */ + MBUS_ID(0x01, 0x9f) 0 0x100000 /* CS6 */ + MBUS_ID(0x01, 0xdf) 0 0x100000>; /* CS7 */ + interrupts = <23>; + status = "disabled"; + }; + +To enable the direct mode, the board specific 'ranges' property in the +'soc' node needs to add the entries for the desired SPI controllers +and its chip-selects that are used in the direct mode instead of PIO +mode. Here an example for this (SPI controller 0, device 1 and SPI +controller 1, device 2 are used in direct mode. All other SPI device +are used in the default indirect (PIO) mode): + soc { + /* + * Enable the SPI direct access by configuring an entry + * here in the board-specific ranges property + */ + ranges = <MBUS_ID(0xf0, 0x01) 0 0 0xf1000000 0x100000 /* internal regs */ + MBUS_ID(0x01, 0x1d) 0 0 0xfff00000 0x100000 /* BootROM */ + MBUS_ID(0x01, 0x5e) 0 0 0xf1100000 0x100000 /* SPI0-DEV1 */ + MBUS_ID(0x01, 0x9a) 0 0 0xf1200000 0x100000>; /* SPI1-DEV2 */ + +For further information on the MBus bindings, please see the MBus +DT documentation: +Documentation/devicetree/bindings/bus/mvebu-mbus.txt diff --git a/drivers/spi/spi-orion.c b/drivers/spi/spi-orion.c index a87cfd4..63c4453 100644 --- a/drivers/spi/spi-orion.c +++ b/drivers/spi/spi-orion.c @@ -18,6 +18,7 @@ #include <linux/module.h> #include <linux/pm_runtime.h> #include <linux/of.h> +#include <linux/of_address.h> #include <linux/of_device.h> #include <linux/clk.h> #include <linux/sizes.h> @@ -43,6 +44,9 @@ #define ORION_SPI_INT_CAUSE_REG 0x10 #define ORION_SPI_TIMING_PARAMS_REG 0x18 +/* Register for the "Direct Mode" */ +#define SPI_DIRECT_WRITE_CONFIG_REG 0x20 + #define ORION_SPI_TMISO_SAMPLE_MASK (0x3 << 6) #define ORION_SPI_TMISO_SAMPLE_1 (1 << 6) #define ORION_SPI_TMISO_SAMPLE_2 (2 << 6) @@ -78,11 +82,18 @@ struct orion_spi_dev { bool is_errata_50mhz_ac; }; +struct orion_direct_acc { + void __iomem *vaddr; + u32 size; +}; + struct orion_spi { struct spi_master *master; void __iomem *base; struct clk *clk; const struct orion_spi_dev *devdata; + + struct orion_direct_acc direct_access[ORION_NUM_CHIPSELECTS]; }; static inline void __iomem *spi_reg(struct orion_spi *orion_spi, u32 reg) @@ -367,15 +378,49 @@ orion_spi_write_read_16bit(struct spi_device *spi, return 1; } +static size_t orion_spi_max_transfer_size(struct spi_device *spi) +{ + struct orion_spi *orion_spi; + int cs = spi->chip_select; + + orion_spi = spi_master_get_devdata(spi->master); + + /* Return max size for direct mode */ + if (orion_spi->direct_access[cs].vaddr) + return orion_spi->direct_access[cs].size; + + /* Return the default max size if the direct mode is not used */ + return SIZE_MAX; +} + static unsigned int orion_spi_write_read(struct spi_device *spi, struct spi_transfer *xfer) { unsigned int count; int word_len; + struct orion_spi *orion_spi; + int cs = spi->chip_select; word_len = spi->bits_per_word; count = xfer->len; + orion_spi = spi_master_get_devdata(spi->master); + + /* + * Use SPI direct write mode if base address is available. Otherwise + * fall back to PIO mode for this transfer. + */ + if ((orion_spi->direct_access[cs].vaddr) && (xfer->tx_buf) && + (count <= orion_spi->direct_access[cs].size) && (word_len == 8)) { + /* + * Send the tx-data to the SPI device via the direct + * mapped address window + */ + memcpy(orion_spi->direct_access[cs].vaddr, xfer->tx_buf, count); + + return count; + } + if (word_len == 8) { const u8 *tx = xfer->tx_buf; u8 *rx = xfer->rx_buf; @@ -425,6 +470,10 @@ static int orion_spi_reset(struct orion_spi *orion_spi) { /* Verify that the CS is deasserted */ orion_spi_clrbits(orion_spi, ORION_SPI_IF_CTRL_REG, 0x1); + + /* Don't deassert CS between the direct mapped SPI transfers */ + writel(0, spi_reg(orion_spi, SPI_DIRECT_WRITE_CONFIG_REG)); + return 0; } @@ -504,6 +553,7 @@ static int orion_spi_probe(struct platform_device *pdev) struct resource *r; unsigned long tclk_hz; int status = 0; + struct device_node *np; master = spi_alloc_master(&pdev->dev, sizeof(*spi)); if (master == NULL) { @@ -529,6 +579,7 @@ static int orion_spi_probe(struct platform_device *pdev) master->setup = orion_spi_setup; master->bits_per_word_mask = SPI_BPW_MASK(8) | SPI_BPW_MASK(16); master->auto_runtime_pm = true; + master->max_transfer_size = orion_spi_max_transfer_size; platform_set_drvdata(pdev, master); @@ -576,6 +627,42 @@ static int orion_spi_probe(struct platform_device *pdev) goto out_rel_clk; } + /* Scan all SPI devices of this controller for direct mapped devices */ + for_each_available_child_of_node(pdev->dev.of_node, np) { + u32 cs; + + /* Get chip-select number from the "reg" property */ + status = of_property_read_u32(np, "reg", &cs); + if (status) { + dev_err(&pdev->dev, + "%s has no valid 'reg' property (%d)\n", + np->full_name, status); + status = 0; + continue; + } + + /* + * Check if an address is configured for this SPI device. If + * not, the MBus mapping via the 'ranges' property in the 'soc' + * node is not configured and this device should not use the + * direct mode. In this case, just continue with the next + * device. + */ + status = of_address_to_resource(pdev->dev.of_node, cs + 1, r); + if (status) + continue; + + spi->direct_access[cs].vaddr = devm_ioremap_resource(&pdev->dev, + r); + if (IS_ERR(spi->direct_access[cs].vaddr)) { + status = PTR_ERR(spi->direct_access[cs].vaddr); + goto out_rel_clk; + } + + spi->direct_access[cs].size = resource_size(r); + dev_info(&pdev->dev, "CS%d configured for direct access\n", cs); + } + pm_runtime_set_active(&pdev->dev); pm_runtime_use_autosuspend(&pdev->dev); pm_runtime_set_autosuspend_delay(&pdev->dev, SPI_AUTOSUSPEND_TIMEOUT);
This patch adds support for the direct access mode to the Orion SPI driver which is used on the Marvell Armada based SoCs. In this direct mode, all data written to (or read from) a specifically mapped MBus window (linked to one SPI chip-select on one of the SPI controllers) will be transferred directly to the SPI bus. Without the need to control the SPI registers in between. This can improve the SPI transfer rate in such cases. Both, direct-read and -write mode are supported. But only the write mode has been tested. This mode especially benefits from the SPI direct mode, as the data bytes are written head-to-head to the SPI bus, without any additional addresses. One use-case for this direct write mode is, programming a FPGA bitstream image into the FPGA connected to the SPI bus at maximum speed. This mode is described in chapter "22.5.2 Direct Write to SPI" in the Marvell Armada XP Functional Spec Datasheet. Signed-off-by: Stefan Roese <sr@denx.de> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Gregory CLEMENT <gregory.clement@free-electrons.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Mark Brown <broonie@kernel.org> --- v5: - Added some documentation for the direct mode to the orion-spi DT bindings document, including an example and a reference to the mbus DT bindings documentation. v4: - Add max_transfer_size() - Fall back from direct-write mode to PIO mode when - transfer size is exceedeed - non-8bit transfers - Restrict to direct-write mode for now. SPI direct read mode can be added later, when someone has a platform supporting this mode. v3: - Used static MBus windows again, as suggested by Arnd - No new DT bindings needed for this. The MBus windows need to be configured in the 'reg' property of the SPI controller. This is done in a separate patch for the Armada platforms. - The direct mapping is configured and enable by adding the specific MBus entry to the 'ranges' property of the 'soc' node in the board dts file. v2: - Use one MBus window for each SPI controller instead of one for for each SPI device. This MBus window is re-assigned, once the CS changes. - Assert the CS over the entire message - Don't restrict the direct access mode to only direct write mode - Add check for max size before using memcpy() - Remove spidev from DT bindings documentation .../devicetree/bindings/spi/spi-orion.txt | 47 +++++++++++- drivers/spi/spi-orion.c | 87 ++++++++++++++++++++++ 2 files changed, 133 insertions(+), 1 deletion(-)