mbox series

[v3,0/3] spi: Add DMA mode support to spi-qcom-qspi

Message ID 1681481153-24036-1-git-send-email-quic_vnivarth@quicinc.com (mailing list archive)
Headers show
Series spi: Add DMA mode support to spi-qcom-qspi | expand

Message

Vijaya Krishna Nivarthi April 14, 2023, 2:05 p.m. UTC
There are large number of QSPI irqs that fire during boot/init and later
on every suspend/resume.
This could be made faster by doing DMA instead of PIO.
Below is comparison for number of interrupts raised in 2 acenarios...
Boot up and stabilise
Suspend/Resume

Sequence   PIO    DMA
=======================
Boot-up    69088  19284
S/R        5066   3430

Though we have not made measurements for speed, power we expect
the performance to be better with DMA mode and no regressions were
encountered in testing.

Vijaya Krishna Nivarthi (3):
  spi: dt-bindings: qcom,spi-qcom-qspi: Add iommus
  arm64: dts: qcom: sc7280: Add stream-id of qspi to iommus
  spi: spi-qcom-qspi: Add DMA mode support
---
v2 -> v3:
- Modified commit messages
- Made a change to driver based on re-review

v1 -> v2:
- Added documentation file to the series
- Made changes to driver based on HPG re-review
---
 .../bindings/spi/qcom,spi-qcom-qspi.yaml           |   3 +
 arch/arm64/boot/dts/qcom/sc7280.dtsi               |   1 +
 drivers/spi/spi-qcom-qspi.c                        | 434 +++++++++++++++++++--
 3 files changed, 407 insertions(+), 31 deletions(-)

Comments

Douglas Anderson April 14, 2023, 3:48 p.m. UTC | #1
Hi,

On Fri, Apr 14, 2023 at 7:06 AM Vijaya Krishna Nivarthi
<quic_vnivarth@quicinc.com> wrote:
>
> There are large number of QSPI irqs that fire during boot/init and later
> on every suspend/resume.
> This could be made faster by doing DMA instead of PIO.
> Below is comparison for number of interrupts raised in 2 acenarios...

s/acenarios/scenarios

> Boot up and stabilise
> Suspend/Resume
>
> Sequence   PIO    DMA
> =======================
> Boot-up    69088  19284
> S/R        5066   3430
>
> Though we have not made measurements for speed, power we expect
> the performance to be better with DMA mode and no regressions were
> encountered in testing.

Measuring the speed isn't really very hard, so I gave it a shot.

I used a truly terrible python script to do this on a Chromebook:

--

import os
import time

os.system("""
stop ui
stop powerd

cd /sys/devices/system/cpu/cpufreq
for policy in policy*; do
  cat ${policy}/cpuinfo_max_freq > ${policy}/scaling_min_freq
done
""")

all_times = []
for i in range(1000):
  start = time.time()
  os.system("flashrom -p host -r /tmp/foo.bin")
  end = time.time()

  all_times.append(end - start)
  print("Iteration %d, min=%.2f, max=%.2f, avg=%.2f" % (
      i, min(all_times), max(all_times), sum(all_times) / len(all_times)))

--

The good news is that after applying your patches the loop runs _much_ faster.

The bad news is that it runs much faster because it very quickly fails
and errors out. flashrom just keeps reporting:

Opened /dev/mtd0 successfully
Found Programmer flash chip "Opaque flash chip" (8192 kB,
Programmer-specific) on host.
Reading flash... Cannot read 0x001000 bytes at 0x000000: Connection timed out
read_flash: failed to read (00000000..0x7fffff).
Read operation failed!
FAILED.
FAILED

I went back and tried v1, v2, and v3 and all three versions fail.
Douglas Anderson April 14, 2023, 4:42 p.m. UTC | #2
Hi,

On Fri, Apr 14, 2023 at 8:48 AM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Fri, Apr 14, 2023 at 7:06 AM Vijaya Krishna Nivarthi
> <quic_vnivarth@quicinc.com> wrote:
> >
> > There are large number of QSPI irqs that fire during boot/init and later
> > on every suspend/resume.
> > This could be made faster by doing DMA instead of PIO.
> > Below is comparison for number of interrupts raised in 2 acenarios...
>
> s/acenarios/scenarios
>
> > Boot up and stabilise
> > Suspend/Resume
> >
> > Sequence   PIO    DMA
> > =======================
> > Boot-up    69088  19284
> > S/R        5066   3430
> >
> > Though we have not made measurements for speed, power we expect
> > the performance to be better with DMA mode and no regressions were
> > encountered in testing.
>
> Measuring the speed isn't really very hard, so I gave it a shot.
>
> I used a truly terrible python script to do this on a Chromebook:
>
> --
>
> import os
> import time
>
> os.system("""
> stop ui
> stop powerd
>
> cd /sys/devices/system/cpu/cpufreq
> for policy in policy*; do
>   cat ${policy}/cpuinfo_max_freq > ${policy}/scaling_min_freq
> done
> """)
>
> all_times = []
> for i in range(1000):
>   start = time.time()
>   os.system("flashrom -p host -r /tmp/foo.bin")
>   end = time.time()
>
>   all_times.append(end - start)
>   print("Iteration %d, min=%.2f, max=%.2f, avg=%.2f" % (
>       i, min(all_times), max(all_times), sum(all_times) / len(all_times)))
>
> --
>
> The good news is that after applying your patches the loop runs _much_ faster.
>
> The bad news is that it runs much faster because it very quickly fails
> and errors out. flashrom just keeps reporting:
>
> Opened /dev/mtd0 successfully
> Found Programmer flash chip "Opaque flash chip" (8192 kB,
> Programmer-specific) on host.
> Reading flash... Cannot read 0x001000 bytes at 0x000000: Connection timed out
> read_flash: failed to read (00000000..0x7fffff).
> Read operation failed!
> FAILED.
> FAILED
>
> I went back and tried v1, v2, and v3 and all three versions fail.

Ah, I see what's likely the problem. Your patch series only adds the
"iommus" for sc7280 but I'm testing on sc7180. That means:

1. You need to add the iommus to _all_ the boards that have qspi. That
means sc7280, sc7180, and sdm845.

2. Ideally the code should still be made to work (it should fall back
to PIO mode) if DMA isn't properly enabled. That would keep old device
trees working, which we're supposed to do.

-Doug
Vijaya Krishna Nivarthi April 14, 2023, 5:01 p.m. UTC | #3
On 4/14/2023 10:12 PM, Doug Anderson wrote:
> Hi,
>
> On Fri, Apr 14, 2023 at 8:48 AM Doug Anderson <dianders@chromium.org> wrote:
>> Hi,
>>
>> On Fri, Apr 14, 2023 at 7:06 AM Vijaya Krishna Nivarthi
>> <quic_vnivarth@quicinc.com> wrote:
>>> There are large number of QSPI irqs that fire during boot/init and later
>>> on every suspend/resume.
>>> This could be made faster by doing DMA instead of PIO.
>>> Below is comparison for number of interrupts raised in 2 acenarios...
>> s/acenarios/scenarios
>>
>>> Boot up and stabilise
>>> Suspend/Resume
>>>
>>> Sequence   PIO    DMA
>>> =======================
>>> Boot-up    69088  19284
>>> S/R        5066   3430
>>>
>>> Though we have not made measurements for speed, power we expect
>>> the performance to be better with DMA mode and no regressions were
>>> encountered in testing.
>> Measuring the speed isn't really very hard, so I gave it a shot.
>>
>> I used a truly terrible python script to do this on a Chromebook:
>>
>> --
>>
>> import os
>> import time
>>
>> os.system("""
>> stop ui
>> stop powerd
>>
>> cd /sys/devices/system/cpu/cpufreq
>> for policy in policy*; do
>>    cat ${policy}/cpuinfo_max_freq > ${policy}/scaling_min_freq
>> done
>> """)
>>
>> all_times = []
>> for i in range(1000):
>>    start = time.time()
>>    os.system("flashrom -p host -r /tmp/foo.bin")
>>    end = time.time()
>>
>>    all_times.append(end - start)
>>    print("Iteration %d, min=%.2f, max=%.2f, avg=%.2f" % (
>>        i, min(all_times), max(all_times), sum(all_times) / len(all_times)))
>>
>> --
>>
>> The good news is that after applying your patches the loop runs _much_ faster.
>>
>> The bad news is that it runs much faster because it very quickly fails
>> and errors out. flashrom just keeps reporting:
>>
>> Opened /dev/mtd0 successfully
>> Found Programmer flash chip "Opaque flash chip" (8192 kB,
>> Programmer-specific) on host.
>> Reading flash... Cannot read 0x001000 bytes at 0x000000: Connection timed out
>> read_flash: failed to read (00000000..0x7fffff).
>> Read operation failed!
>> FAILED.
>> FAILED
>>
>> I went back and tried v1, v2, and v3 and all three versions fail.
> Ah, I see what's likely the problem. Your patch series only adds the
> "iommus" for sc7280 but I'm testing on sc7180. That means:
>
> 1. You need to add the iommus to _all_ the boards that have qspi. That
> means sc7280, sc7180, and sdm845.
>
> 2. Ideally the code should still be made to work (it should fall back
> to PIO mode) if DMA isn't properly enabled. That would keep old device
> trees working, which we're supposed to do.


Thank you very much for the review, script, test and quick debug.
Will check same and update a v4.

-Vijay/


> -Doug