Message ID | 3628E5E2-7EA9-488F-AF5F-A2E43D2D1E3E@martin.sperl.org (mailing list archive) |
---|---|
State | Accepted |
Commit | e3a2be3030e2fec27a2577d3c52203da090a4366 |
Headers | show |
On Sun, Mar 29, 2015 at 04:03:25PM +0200, Martin Sperl wrote:
> reduce interrupts/message
Applied, thanks. There's something wrong with the way you're sending
patches - long subject lines are being wrapped into the body of the
patch. Please check this, the easiest thing is most likely to be to use
git format-patch and git send-email to send your patches as that's
pretty much zero effort.
Please while looking at this also look into threading your patches into
a single thread per series, this makes things easier to work with.
On 03/29/2015 08:03 AM, Martin Sperl wrote: > reduce interrupts/message > > To reduce the number of interrupts/message we fill the FIFO before > enabling interrupts - for short messages this reduces the interrupt count > from 2 to 1 interrupt. > > There have been rare cases where short (<200ns) chip-select switches with > native CS have been observed during such operation, this is why this > optimization is only enabled for GPIO-CS. > diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c > + /* fill in fifo if we have gpio-cs > + * note that there have been rare events where the native-CS > + * flapped for <1us which may change the behaviour > + * with gpio-cs this does not happen, so it is implemented > + * only for this case > + */ > + if (gpio_is_valid(spi->cs_gpio)) { > + /* enable HW block, but without interrupts enabled > + * this would triggern an immediate interrupt > + */ > + bcm2835_wr(bs, BCM2835_SPI_CS, > + cs | BCM2835_SPI_CS_TA); > + /* fill in tx fifo as much as possible */ > + bcm2835_wr_fifo(bs); > + } I can understand perfectly why the code fills the FIFO before enabling interrupts; it avoids having to immediately service an interrupt simply to fill the FIFO. However, I'm not sure why this is in any way related to whether the chip-select GPIO is valid. Surely we always want to do this? How does the mechanism used to control chip selects influence whether we want to pre-fill the FIFO? -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 30, 2015 at 09:16:21PM -0600, Stephen Warren wrote: > On 03/29/2015 08:03 AM, Martin Sperl wrote: > > There have been rare cases where short (<200ns) chip-select switches with > > native CS have been observed during such operation, this is why this > > optimization is only enabled for GPIO-CS. > > + /* fill in fifo if we have gpio-cs > > + * note that there have been rare events where the native-CS > > + * flapped for <1us which may change the behaviour > > + * with gpio-cs this does not happen, so it is implemented > > + * only for this case > > + */ > However, I'm not sure why this is in any way related to whether the > chip-select GPIO is valid. Surely we always want to do this? How does > the mechanism used to control chip selects influence whether we want to > pre-fill the FIFO? I think both the comment and the commit message answer that question - something triggers spurious chip select changes?
> On 31.03.2015, at 05:16, Stephen Warren <swarren@wwwdotorg.org> wrote: > > I can understand perfectly why the code fills the FIFO before enabling > interrupts; it avoids having to immediately service an interrupt simply > to fill the FIFO. > > However, I'm not sure why this is in any way related to whether the > chip-select GPIO is valid. Surely we always want to do this? How does > the mechanism used to control chip selects influence whether we want to > pre-fill the FIFO? During the time I was building a DMA only driver I saw "rare" glitches in the CS line when using native CS. The "glitch" is that the CS drops to inactive for a short period of time - typically 1 sample length at 10MHz sample rate, so <0.1us. These "glitches" also once have been observed with the current driver when using native-CS, so I think it is prudent to avoid native-CS when enabling this optimization. Hence this limitation or maybe even the full move to GPIO-CS for all. -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/30/2015 11:25 PM, Mark Brown wrote: > On Mon, Mar 30, 2015 at 09:16:21PM -0600, Stephen Warren wrote: >> On 03/29/2015 08:03 AM, Martin Sperl wrote: > >>> There have been rare cases where short (<200ns) chip-select >>> switches with native CS have been observed during such >>> operation, this is why this optimization is only enabled for >>> GPIO-CS. > >>> + /* fill in fifo if we have gpio-cs + * note that there have >>> been rare events where the native-CS + * flapped for <1us >>> which may change the behaviour + * with gpio-cs this does not >>> happen, so it is implemented + * only for this case + */ > >> However, I'm not sure why this is in any way related to whether >> the chip-select GPIO is valid. Surely we always want to do this? >> How does the mechanism used to control chip selects influence >> whether we want to pre-fill the FIFO? > > I think both the comment and the commit message answer that > question - something triggers spurious chip select changes? I must admit I don't feel either the commit message or the comment explain the situation. They certainly state that there are glitches in the "native CS" case, but that doesn't *explain* them. It seems more likely the under-filling the FIFO would cause periods where the FIFO was empty which would be aout the only case I could naively think of for the CS to be de-asserted. More investigation would be useful. -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Apr 04, 2015 at 07:49:41PM -0600, Stephen Warren wrote: > On 03/30/2015 11:25 PM, Mark Brown wrote: > > On Mon, Mar 30, 2015 at 09:16:21PM -0600, Stephen Warren wrote: > >> On 03/29/2015 08:03 AM, Martin Sperl wrote: > >>> There have been rare cases where short (<200ns) chip-select > >>> switches with native CS have been observed during such > >>> operation, this is why this optimization is only enabled for > >>> GPIO-CS. > >> However, I'm not sure why this is in any way related to whether > >> the chip-select GPIO is valid. Surely we always want to do this? > >> How does the mechanism used to control chip selects influence > >> whether we want to pre-fill the FIFO? > > I think both the comment and the commit message answer that > > question - something triggers spurious chip select changes? > I must admit I don't feel either the commit message or the comment > explain the situation. They certainly state that there are glitches in > the "native CS" case, but that doesn't *explain* them. It seems more > likely the under-filling the FIFO would cause periods where the FIFO > was empty which would be aout the only case I could naively think of > for the CS to be de-asserted. More investigation would be useful. Right, and I have to say I do suspect that the underlying thing is that the FIFO is underrunning, but as far as the optimization is concerned that's a separate thing. The reason this isn't enabled for native chip selects is that it's not working, the reason it's not working is something that should indeed probably be investigated.
> On 06.04.2015, at 19:21, Mark Brown <broonie@kernel.org> wrote: > Right, and I have to say I do suspect that the underlying thing is that > the FIFO is underrunning, but as far as the optimization is concerned > that's a separate thing. The reason this isn't enabled for native chip > selects is that it's not working, the reason it's not working is > something that should indeed probably be investigated. Actually it happens exactly when setting the CS-register with the interrupt flags enabled - typically observed in the middle of a transmit of a byte the CS jumps, but the clock and data continue the transfers correctly As the CS register contains the interrupt flags as well as the control for the native-chip-selects this is impacting the chip select lines in native mode. I have never seen a glitch on the CLK or MISO/MOSI line in my logic analyzer only CS jumps in the order of 1 sample in the logic analyzer - @20MHz sample rate, so in the order of 50ns, probably less... On top that happens only under some rare circumstances and can get rarely observed - lots of samples (and memory) are needed to catch that one, which typically results in a message in dmesg due to some unexpected response crc checksum errors stalled processing. This is from my experience from over a year ago with a fully DMA pipelined driver that did all the programming of SPI in DMA as well and there were some reports of people seeing "glitches" before I moved to GPIO-CS (equivalents) See also: http://www.raspberrypi.org/forums/viewtopic.php?f=44&t=19489&start=125#p451817 for some observations mostly related to CLEAR_TX/CLEAR_RX that also de-assert CS for short periods. I also saw it _once_ in one of my traces when using native_cs 2-3 weeks ago, but unfortunately I did not keep the capture... Hence the requirement to use of gpio-cs when running this optimization, as then the CS is not managed by the HW itself. I have been filling (dd if=/dev/zero) a 2GB SD card repeatedly (more than 9 times) for the last 12 hours and I have seen no issues - besides the spi bus being occupied and spi_bus_locked for 17ms for the data-transfers, which obviously impacts packet-reception (packet loss) on CAN and ethernet network - but that is expected due to this heavy load and long transfers. Note that this test was executed using the latest patches I have sent running at 16k interrupts/s and 4400 context-switches/second at 4MHz SPI-bus-speed for the SD-card. The transfer of 1.6GB to the SD-card (filling up the partition) took about 4350 seconds or 366kB/s - 3600s would be the ideal situation without overheads, latencies or similar. -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Now remembering a bit more from what I have observed: > On 07.04.2015, at 08:25, Martin Sperl <kernel@martin.sperl.org> wrote: > > See also: > http://www.raspberrypi.org/forums/viewtopic.php?f=44&t=19489&start=125#p451817 > for some observations mostly related to CLEAR_TX/CLEAR_RX that also > de-assert CS for short periods. Note that this TX/RX reset with native CS is also inhibiting the use of DMA for any transfers not a multiple of 4 (DMA transfers data into the FIFO in words not bytes). Because after those transfers we have to reset the FIFO or the remaining bytes still in the FIFO will get shifted out before a subsequent transfer. So for any DMA enabled driver we need this kind of gpio-cs to avoid this CS-glitch situation unless we want a "DMA only on multiple of 4" with native-cs situation... -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/07/2015 12:25 AM, Martin Sperl wrote: > >> On 06.04.2015, at 19:21, Mark Brown <broonie@kernel.org> wrote: >> Right, and I have to say I do suspect that the underlying thing is that >> the FIFO is underrunning, but as far as the optimization is concerned >> that's a separate thing. The reason this isn't enabled for native chip >> selects is that it's not working, the reason it's not working is >> something that should indeed probably be investigated. > > Actually it happens exactly when setting the CS-register with the > interrupt flags enabled - typically observed in the middle of a transmit > of a byte the CS jumps, but the clock and data continue the transfers > correctly > > As the CS register contains the interrupt flags as well as the > control for the native-chip-selects this is impacting the chip select > lines in native mode. Is the driver simply programming the HW incorrectly then? I would expect the driver to do something roughly like: * Set up the HW to execute the transaction; everything except enabling IRQs and telling the HW to "go" * Clear stale IRQ status (perhaps do this right at the start) * Enable IRQs * Tell the HW to "go" ... then not touch any CS-related register for the entire transfer. There shouldn't be a need to enable/disable IRQs during the transfer; just leave them enabled the entire time, until all bytes have been transferred. -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 07, 2015 at 09:39:03AM -0600, Stephen Warren wrote: > On 04/07/2015 12:25 AM, Martin Sperl wrote: > >As the CS register contains the interrupt flags as well as the > >control for the native-chip-selects this is impacting the chip select > >lines in native mode. > Is the driver simply programming the HW incorrectly then? I would expect the > driver to do something roughly like: > * Set up the HW to execute the transaction; everything except enabling IRQs > and telling the HW to "go" > * Clear stale IRQ status (perhaps do this right at the start) > * Enable IRQs > * Tell the HW to "go" > ... then not touch any CS-related register for the entire transfer. There > shouldn't be a need to enable/disable IRQs during the transfer; just leave > them enabled the entire time, until all bytes have been transferred. We will need to make sure /CS is kept asserted between transfers in a message too.
> On 07.04.2015, at 17:39, Stephen Warren <swarren@wwwdotorg.org> wrote: > > Is the driver simply programming the HW incorrectly then? I would expect the driver to do something roughly like: > > * Set up the HW to execute the transaction; everything except enabling IRQs and telling the HW to "go" > * Clear stale IRQ status (perhaps do this right at the start) > * Enable IRQs > * Tell the HW to "go" no it actually works differently: * set TA and some of the other flags (POL,...) in CS-REG, which activates the SPI-block and as soon as there is data it will start the transfer. if you set the IRQ flags here, then IRQ will trigger as soon as the write hits the HW-block, as the fifo is empty This is what the driver does right in 4.0rcX - it also sets the interrupt flags. But the problem is that it takes typcially 20us for the ISR to get REALLY executed, which means an unecessary delay of 20us before the transfer really starts. The prefill approach instead: * sets TA and some other flags, in CS,but leaves Interrupts disabled * fills in FIFO which initiates the transfer * sets the same values in CS-REG as above but now also with IRQ enabled At this point in time there is a slight chance that CS will toggle for <1us when using native CS - this does not happen with gpio-CS for obvious reasons. Similar for the situation where you request a transfer of 13 Bytes via DMA. You do this by filling in SPI-LEN with 13 to tell the engine to only shift 13 bytes out even if DMA fills in 16 bytes. When this transfer is finished then there are still 3 bytes in the FIFO, so you have to clean that by setting CLEAR_TX/CLEAR_RX in the CS-register. If you set only those 2 bit without modifying the others, then there is again a chance that native-CS get toggled for a short period of time. This is obviously not ideal when you have to keep CS low for the next spi_transfer. The only way around this was to actually cheat by using only CS2, and modifying the CSPOL0 and CSPOL1 to do what I want because the CS only seems to toggle for the "active" CS (as per bits 0:1 in CS). But that is essentially the same as using cs-gpio, but just uses a different set of registers. I guess that there is some sort of logic in the HW that re-evaluates the state of the native-chip-selects whenever CS-reg is written. Even for the case where there is no change there seems to be a short period of time when the CS is not driven (high or low), which, together with some pull-ups, is pulling those lines high - and this is what we are seeing in some rare cases. So any write to the CS-register can influence native chip selects but this only happens in rare cases typically showing after several hours of repeated spi-transfers (in my case after 20M CAN messages received and about 100M SPI messages). Using GPIO-cs solves these situations by not being influenced by the SPI-hardware in any way in the first place. Actually - if I think of it - even with the current driver in 4.0rcX which is configuring CS-reg on every spi_transfer in a spi_message could trigger the same behavior as well. But as spi_messages with multiple spi_transfers are rarely used and as the issue itself is only happening only with a low probability and some devices not detecting cs changes that are below a certain time the likleyhood of such a situation being detected are minimal and would typically get attributed to other factors... Hope that this answers the question and summarizes the observations that i have made and why we have to move to gpio-cs for those optimizations to become active... Martin-- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 07, 2015 at 07:05:01PM +0200, Martin Sperl wrote: > But as spi_messages with multiple spi_transfers are rarely used > and as the issue itself is only happening only with a low probability > and some devices not detecting cs changes that are below a certain time > the likleyhood of such a situation being detected are minimal and > would typically get attributed to other factors... For your workload perhaps. For some workloads they are very common (register reads are the most obvious example, they are typically a write transfer to provide the register address followed by a read transfer for the data).
> On 07.04.2015, at 19:40, Mark Brown <broonie@kernel.org> wrote: > > For your workload perhaps. For some workloads they are very common > (register reads are the most obvious example, they are typically a write > transfer to provide the register address followed by a read transfer for > the data). Which spi_write_then_read collapses into a single transfer. Still I try to run now 4 distinct devices for my testing and I believe the enc28j60 is one of those devices that does that and I have not seen an issue there during typical loads... Obviously there are some devices that are easier to load than others to trigger such a situation. It really just shows when running some transfers a lot of times... So the move to gpio-cs should be on our priority list. I will come up with a generic approach that allows also to set native CS if needed... (as sketched in an earlier mail) Martin -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 07, 2015 at 08:18:57PM +0200, Martin Sperl wrote: > > On 07.04.2015, at 19:40, Mark Brown <broonie@kernel.org> wrote: > > For your workload perhaps. For some workloads they are very common > > (register reads are the most obvious example, they are typically a write > > transfer to provide the register address followed by a read transfer for > > the data). > Which spi_write_then_read collapses into a single transfer. No it doesn't. It creates a single message with two transfers (which is what I'd expect any user doing this to do, there are others that don't use the SPI helper APIs).
> On 07.04.2015, at 20:23, Mark Brown <broonie@kernel.org> wrote: > > No it doesn't. It creates a single message with two transfers (which is > what I'd expect any user doing this to do, there are others that don't > use the SPI helper APIs). Well after rereading it now - it does copying, so I was making a naiv assumption that it would do a single transfer, but then there is the SPI_3WIRE case that is special. In principle it could run in a single transfer at the cost of some memory (64 vs. 32 bytes) for anything that is not SPI_3WIRE. But then spi_transfer is also in the order of 32 bytes.... This definitely would speed up things for a "typical" interrupt-driven driver implementation using the transfer_one interface, as it would avoid one interrupt and 2 context switches. On the other hand this could also get optimized inside the framework at the cost of copying twice and we would optimize other cases as well. But that is some other project... -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/07/2015 11:05 AM, Martin Sperl wrote: > >> On 07.04.2015, at 17:39, Stephen Warren <swarren@wwwdotorg.org> wrote: >> >> Is the driver simply programming the HW incorrectly then? I would expect the driver to do something roughly like: >> >> * Set up the HW to execute the transaction; everything except enabling IRQs and telling the HW to "go" >> * Clear stale IRQ status (perhaps do this right at the start) >> * Enable IRQs >> * Tell the HW to "go" > > no it actually works differently: > * set TA and some of the other flags (POL,...) in CS-REG, which activates > the SPI-block and as soon as there is data it will start the transfer. > if you set the IRQ flags here, then IRQ will trigger as soon as the write > hits the HW-block, as the fifo is empty > > This is what the driver does right in 4.0rcX - it also sets the interrupt > flags. > > But the problem is that it takes typcially 20us for the ISR to get REALLY > executed, which means an unecessary delay of 20us before the transfer > really starts. > > The prefill approach instead: > * sets TA and some other flags, in CS,but leaves Interrupts disabled > * fills in FIFO which initiates the transfer > * sets the same values in CS-REG as above but now also with IRQ enabled Can you fill the FIFO as step 1? That way, you could presumably write to that CS-REG just once, with all the desired bits set in one go. > At this point in time there is a slight chance that CS will toggle for <1us > when using native CS - this does not happen with gpio-CS for obvious reasons. > > Similar for the situation where you request a transfer of 13 Bytes via DMA. > You do this by filling in SPI-LEN with 13 to tell the engine to only shift > 13 bytes out even if DMA fills in 16 bytes. That sounds pretty typical for a 32-bit/dword-wide FIFO containing 8-bit data. Surely DMA isn't relevant here; every dword write will put 4 bytes into the FIFO irrespective of whether the write comes from the CPU or a DMA engine. Generally, the HW will pull dword-sized values out of the FIFO, and ignore any extra bytes that are packed into the dword beyond whatever the programmed transfer length is. Doesn't the bcm283x SPI HW do that too? > When this transfer is finished > then there are still 3 bytes in the FIFO, so you have to clean that by > setting CLEAR_TX/CLEAR_RX in the CS-register. That sounds pretty unusual. Have you tried not cleaning out the FIFO and seeing if the next transfer does in fact only transfer the data you write next, not the extra/stale bytes from the previous transfer? -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On 08.04.2015, at 04:01, Stephen Warren <swarren@wwwdotorg.org> wrote: > > Can you fill the FIFO as step 1? That way, you could presumably write to > that CS-REG just once, with all the desired bits set in one go. Actually the "manual" says that you first need to enable TA - even for a polling driver and then you can fill the FIFO. But I have also tried it - it does not work, as the TA flag not only initiates the transfer but also activates/deactivates the whole SPI block (and probably also all the registers). So the data filled into the FIFOs is lost when TA is inactive. > That sounds pretty typical for a 32-bit/dword-wide FIFO containing 8-bit > data. Surely DMA isn't relevant here; every dword write will put 4 bytes > into the FIFO irrespective of whether the write comes from the CPU or a > DMA engine. Generally, the HW will pull dword-sized values out of the > FIFO, and ignore any extra bytes that are packed into the dword beyond > whatever the programmed transfer length is. Doesn't the bcm283x SPI HW > do that too? As the HW can do LoSSI (=9 bit) with the corresponding "command" parsing where some commands trigger 8 bit reads, others 24 and others 32 bit reads, it seems as if it reads as many byte from the fifo as necessary for the specific transfer. > That sounds pretty unusual. Have you tried not cleaning out the FIFO and > seeing if the next transfer does in fact only transfer the data you > write next, not the extra/stale bytes from the previous transfer? I remember having tried this more than a year ago, but then you get the "missing" data from the last word transfer (when using DMA). As I am planning on implementing DMA I guess I will get there. -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2015-04-07 18:28, Mark Brown wrote: > On Tue, Apr 07, 2015 at 09:39:03AM -0600, Stephen Warren wrote: >> On 04/07/2015 12:25 AM, Martin Sperl wrote: >> ... then not touch any CS-related register for the entire transfer. There >> shouldn't be a need to enable/disable IRQs during the transfer; just leave >> them enabled the entire time, until all bytes have been transferred. > > We will need to make sure /CS is kept asserted between transfers in a > message too. The problem is here that you need to touch the CS-register when an individual spi-transfer is finished, as only that way you can disable the (level)interrupts, which would fire immediately again when you exit chip-select. So you are at the risk of it happening at that phase as well... and during the next spi_transfer_one you need to reenable the interrupts again. Anyway: my latest incarnation of the driver does now implement full cs_gpio operation with native-cs getting translated during the new spi_register_cs phase when registering the bus using the (new) function-pointer register_cs in spi_master. So this issue goes away and the code becomes smaller at the same time. This is also used to set up chip_selects to their defaults levels prior to any activity on the SPI bus. -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c index 72d4525..adf157b 100644 --- a/drivers/spi/spi-bcm2835.c +++ b/drivers/spi/spi-bcm2835.c @@ -203,6 +203,22 @@ static int bcm2835_spi_transfer_one(struct spi_master *master, bs->tx_len = tfr->len; bs->rx_len = tfr->len; + /* fill in fifo if we have gpio-cs + * note that there have been rare events where the native-CS + * flapped for <1us which may change the behaviour + * with gpio-cs this does not happen, so it is implemented + * only for this case + */ + if (gpio_is_valid(spi->cs_gpio)) { + /* enable HW block, but without interrupts enabled + * this would triggern an immediate interrupt + */ + bcm2835_wr(bs, BCM2835_SPI_CS, + cs | BCM2835_SPI_CS_TA); + /* fill in tx fifo as much as possible */ + bcm2835_wr_fifo(bs); + } + /* * Enable the HW block. This will immediately trigger a DONE (TX * empty) interrupt, upon which we will fill the TX FIFO with the