mbox series

[0/3] drm: tinydrm driver for adafruit PiTFT 3.5" touchscreen

Message ID 20181024184313.2967-1-eric@anholt.net (mailing list archive)
Headers show
Series drm: tinydrm driver for adafruit PiTFT 3.5" touchscreen | expand

Message

Eric Anholt Oct. 24, 2018, 6:43 p.m. UTC
I was going to start working on making the vc4 driver work with
tinydrm panels, but it turned out tinydrm didn't have the panel I had
previously bought.  So, last night I ported the fbtft staging
driver over to DRM.

It seems to work (with DT at
https://github.com/anholt/linux/commits/drm-misc-next-hx8357d) --
fbdev works great including rotated, and so does modetest.  However,
when X11 comes up at 16bpp, I get:

https://photos.app.goo.gl/8tuhzPFFoDGamEfk8

If I have tinydrm set a preferred bpp of 24, X looks great.  Noralf,
any ideas?

Eric Anholt (3):
  dt-bindings: new binding for Himax HX8357D display panels
  drm: Add an hx8367d tinydrm driver.
  drm/tinydrm: Fix setting of the column/page end addresses.

 .../bindings/display/himax,hx8357d.txt        |  25 ++
 drivers/gpu/drm/tinydrm/Kconfig               |  11 +
 drivers/gpu/drm/tinydrm/Makefile              |   1 +
 drivers/gpu/drm/tinydrm/hx8357d.c             | 262 ++++++++++++++++++
 drivers/gpu/drm/tinydrm/hx8357d.h             |  71 +++++
 drivers/gpu/drm/tinydrm/mipi-dbi.c            |   4 +-
 6 files changed, 372 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/display/himax,hx8357d.txt
 create mode 100644 drivers/gpu/drm/tinydrm/hx8357d.c
 create mode 100644 drivers/gpu/drm/tinydrm/hx8357d.h

Comments

Eric Anholt Oct. 25, 2018, 4:29 p.m. UTC | #1
Eric Anholt <eric@anholt.net> writes:

> I was going to start working on making the vc4 driver work with
> tinydrm panels, but it turned out tinydrm didn't have the panel I had
> previously bought.  So, last night I ported the fbtft staging
> driver over to DRM.
>
> It seems to work (with DT at
> https://github.com/anholt/linux/commits/drm-misc-next-hx8357d) --
> fbdev works great including rotated, and so does modetest.  However,
> when X11 comes up at 16bpp, I get:
>
> https://photos.app.goo.gl/8tuhzPFFoDGamEfk8
>
> If I have tinydrm set a preferred bpp of 24, X looks great.  Noralf,
> any ideas?

Also, with these patches and the format modifier patch I just sent, mesa
with vc4 is now working with this driver on this branch:

https://gitlab.freedesktop.org/anholt/mesa/commits/kmsro

Now I wonder how we can improve performance of the SPI updates.
Noralf Trønnes Oct. 25, 2018, 9:52 p.m. UTC | #2
Den 25.10.2018 18.29, skrev Eric Anholt:
> Eric Anholt <eric@anholt.net> writes:
>
>> I was going to start working on making the vc4 driver work with
>> tinydrm panels, but it turned out tinydrm didn't have the panel I had
>> previously bought.  So, last night I ported the fbtft staging
>> driver over to DRM.
>>
>> It seems to work (with DT at
>> https://github.com/anholt/linux/commits/drm-misc-next-hx8357d) --
>> fbdev works great including rotated, and so does modetest.  However,
>> when X11 comes up at 16bpp, I get:
>>
>> https://photos.app.goo.gl/8tuhzPFFoDGamEfk8
>>
>> If I have tinydrm set a preferred bpp of 24, X looks great.  Noralf,
>> any ideas?
> Also, with these patches and the format modifier patch I just sent, mesa
> with vc4 is now working with this driver on this branch:
>
> https://gitlab.freedesktop.org/anholt/mesa/commits/kmsro

Ah, nice to see this happening!
Getting hw rendering was one of the advantages I saw DRM could provide
over fbdev on these displays. Little did I know how complicated graphics
was outside fbdev, so I was unable to realise this myself.

The current solution to get hw rendering is to have a userspace process
that continously copies the framebuffer:
https://github.com/tasanakorn/rpi-fbcp

It's used by some of the small DIY handheld game consoles that run
emulators which requires hw rendering.

> Now I wonder how we can improve performance of the SPI updates.

At what SPI speed are you running? The datasheet for most of these
display controllers list the max speed as 10MHz, but almost all of them
can go faster. Some are reported going as high as 70-80MHz. That's for
the pixel data transfer, not the commands. tinydrm/mipi-dbi.c sends
commands at 10MHz and pixels at full speed (mipi_dbi_spi_cmd_max_speed()).
Most panels I have run at 32MHz or 48MHz.

Almost all the time is spent in the SPI transfer, so every hz counts. On
the Pi there's byte swapping because the DMA capable SPI controller can't
do 16-bit (tinydrm_swab16()). If I remember correctly this has negligible
impact on performance.

The SPI controller/driver on the Pi has some restrictions on the speeds
to choose from because the divisor has to be a multiple of two
(bcm2835_spi_transfer_one()).

A full update on a 320x480 RGB565 panel is 262.5kB, so it's a lot to push
over SPI. A 2.8" 320x240 panel is more suitable for video fps, because of
the lower resolution.

I'll look at the patches during the weekend.

Noralf.
Eric Anholt Oct. 26, 2018, 2:30 a.m. UTC | #3
Noralf Trønnes <noralf@tronnes.org> writes:

> Den 25.10.2018 18.29, skrev Eric Anholt:
>> Eric Anholt <eric@anholt.net> writes:
>>
>>> I was going to start working on making the vc4 driver work with
>>> tinydrm panels, but it turned out tinydrm didn't have the panel I had
>>> previously bought.  So, last night I ported the fbtft staging
>>> driver over to DRM.
>>>
>>> It seems to work (with DT at
>>> https://github.com/anholt/linux/commits/drm-misc-next-hx8357d) --
>>> fbdev works great including rotated, and so does modetest.  However,
>>> when X11 comes up at 16bpp, I get:
>>>
>>> https://photos.app.goo.gl/8tuhzPFFoDGamEfk8
>>>
>>> If I have tinydrm set a preferred bpp of 24, X looks great.  Noralf,
>>> any ideas?
>> Also, with these patches and the format modifier patch I just sent, mesa
>> with vc4 is now working with this driver on this branch:
>>
>> https://gitlab.freedesktop.org/anholt/mesa/commits/kmsro
>
> Ah, nice to see this happening!
> Getting hw rendering was one of the advantages I saw DRM could provide
> over fbdev on these displays. Little did I know how complicated graphics
> was outside fbdev, so I was unable to realise this myself.
>
> The current solution to get hw rendering is to have a userspace process
> that continously copies the framebuffer:
> https://github.com/tasanakorn/rpi-fbcp
>
> It's used by some of the small DIY handheld game consoles that run
> emulators which requires hw rendering.
>
>> Now I wonder how we can improve performance of the SPI updates.
>
> At what SPI speed are you running? The datasheet for most of these
> display controllers list the max speed as 10MHz, but almost all of them
> can go faster. Some are reported going as high as 70-80MHz. That's for
> the pixel data transfer, not the commands. tinydrm/mipi-dbi.c sends
> commands at 10MHz and pixels at full speed (mipi_dbi_spi_cmd_max_speed()).
> Most panels I have run at 32MHz or 48MHz.

I copied the DT from the adafruit tree, which has it at 32mhz.  System
performance seems to be limited by the copy and format conversion I
think -- in particular, I wonder if we shouldn't be doing our dirty
copies in our own workqueue.  I haven't managed to get any really good
profiling data yet, though.

glxgears at 128x128 is nice and smooth, and at 480x320 it's 6fps.
That's not filling 32mhz of SPI.  On the other hand, I would have
expected the uncached reads for the 4-to-2 swapped conversion to be able
to go faster than 3.5mb/sec.  If it's the uncached reads, we could at
least use NEON for the copy to cached, and probably even do the whole
conversion in NEON with a bit more thought.

Another option: use a vc4 RCL to do RGBA8888 to RGB565, since that will
be less pressure on the bus.  But then, I suppose I should just figure
out what's going on that makes X11 at RGBA8888 break, and fix that so we
don't even have to do that conversion.

I keep hoping there's some way we could feed output from the DISPSLAVE
HVS register directly to the SPI master -- FIFO32 gets us close (two
16-bit pixels packed next to each other, leftmost in the lower 2 bytes),
but the need for byte swapping (as opposed to R/B swapping) I think
makes it impossible.

> Almost all the time is spent in the SPI transfer, so every hz counts. On
> the Pi there's byte swapping because the DMA capable SPI controller can't
> do 16-bit (tinydrm_swab16()). If I remember correctly this has negligible
> impact on performance.
>
> The SPI controller/driver on the Pi has some restrictions on the speeds
> to choose from because the divisor has to be a multiple of two
> (bcm2835_spi_transfer_one()).

That's weird.  My specs say CDIV must be a *power* of two, with lower
values rounded down.  I guess that means we might be running things
fast, not slow, though (and at 32mhz out of 250, we should be getting
the same CDIV).
Noralf Trønnes Oct. 26, 2018, 7:16 p.m. UTC | #4
Den 26.10.2018 04.30, skrev Eric Anholt:
> Noralf Trønnes <noralf@tronnes.org> writes:
>
>> Den 25.10.2018 18.29, skrev Eric Anholt:
>>> Eric Anholt <eric@anholt.net> writes:
>>>
>>>> I was going to start working on making the vc4 driver work with
>>>> tinydrm panels, but it turned out tinydrm didn't have the panel I had
>>>> previously bought.  So, last night I ported the fbtft staging
>>>> driver over to DRM.
>>>>
>>>> It seems to work (with DT at
>>>> https://github.com/anholt/linux/commits/drm-misc-next-hx8357d) --
>>>> fbdev works great including rotated, and so does modetest.  However,
>>>> when X11 comes up at 16bpp, I get:
>>>>
>>>> https://photos.app.goo.gl/8tuhzPFFoDGamEfk8
>>>>
>>>> If I have tinydrm set a preferred bpp of 24, X looks great.  Noralf,
>>>> any ideas?
>>> Also, with these patches and the format modifier patch I just sent, mesa
>>> with vc4 is now working with this driver on this branch:
>>>
>>> https://gitlab.freedesktop.org/anholt/mesa/commits/kmsro
>> Ah, nice to see this happening!
>> Getting hw rendering was one of the advantages I saw DRM could provide
>> over fbdev on these displays. Little did I know how complicated graphics
>> was outside fbdev, so I was unable to realise this myself.
>>
>> The current solution to get hw rendering is to have a userspace process
>> that continously copies the framebuffer:
>> https://github.com/tasanakorn/rpi-fbcp
>>
>> It's used by some of the small DIY handheld game consoles that run
>> emulators which requires hw rendering.
>>
>>> Now I wonder how we can improve performance of the SPI updates.
>> At what SPI speed are you running? The datasheet for most of these
>> display controllers list the max speed as 10MHz, but almost all of them
>> can go faster. Some are reported going as high as 70-80MHz. That's for
>> the pixel data transfer, not the commands. tinydrm/mipi-dbi.c sends
>> commands at 10MHz and pixels at full speed (mipi_dbi_spi_cmd_max_speed()).
>> Most panels I have run at 32MHz or 48MHz.
> I copied the DT from the adafruit tree, which has it at 32mhz.  System
> performance seems to be limited by the copy and format conversion I
> think -- in particular, I wonder if we shouldn't be doing our dirty
> copies in our own workqueue.  I haven't managed to get any really good
> profiling data yet, though.
>
> glxgears at 128x128 is nice and smooth, and at 480x320 it's 6fps.
> That's not filling 32mhz of SPI.  On the other hand, I would have
> expected the uncached reads for the 4-to-2 swapped conversion to be able
> to go faster than 3.5mb/sec.  If it's the uncached reads, we could at
> least use NEON for the copy to cached, and probably even do the whole
> conversion in NEON with a bit more thought.
>
> Another option: use a vc4 RCL to do RGBA8888 to RGB565, since that will
> be less pressure on the bus.  But then, I suppose I should just figure
> out what's going on that makes X11 at RGBA8888 break, and fix that so we
> don't even have to do that conversion.
>
> I keep hoping there's some way we could feed output from the DISPSLAVE
> HVS register directly to the SPI master -- FIFO32 gets us close (two
> 16-bit pixels packed next to each other, leftmost in the lower 2 bytes),
> but the need for byte swapping (as opposed to R/B swapping) I think
> makes it impossible.

I just did some speed tests on a 320x240 display at 3 different speeds.
I also tried with byteswapping disabled. Only full updates will benefit
from passing the buffer straight through to SPI. This is because partial
updates are copied to a transfer buffer anyways to minimize SPI transfer
time. No need to transfer things that haven't changed and a memory copy
is far cheaper than a SPI transfer.

SPI at 48MHz:

pi@pi2835:~$ od -An -vtu4 --endian=big 
/sys/bus/spi/devices/spi0.0/of_node/spi-max-frequency
    48000000

pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
28:320x240@XR24 -v
setting mode 320x240-0Hz@XR24 on connectors 28, crtc 30
freq: 24.87Hz
freq: 24.79Hz

pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
28:320x240@RG16 -v
setting mode 320x240-0Hz@RG16 on connectors 28, crtc 30
freq: 26.33Hz
freq: 26.30Hz

disable byteswapping:
pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
28:320x240@RG16 -v
setting mode 320x240-0Hz@RG16 on connectors 28, crtc 30
freq: 28.40Hz
freq: 28.43Hz


SPI at 64MHz (seems to work):

pi@pi2835:~$ od -An -vtu4 --endian=big 
/sys/bus/spi/devices/spi0.0/of_node/spi-max-frequency
    64000000

pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
28:320x240@XR24 -v
setting mode 320x240-0Hz@XR24 on connectors 28, crtc 30
freq: 32.74Hz
freq: 32.69Hz

pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
28:320x240@RG16 -v
setting mode 320x240-0Hz@RG16 on connectors 28, crtc 30
freq: 35.44Hz
freq: 35.19Hz

disabled byteswapping:
pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
28:320x240@RG16 -v
setting mode 320x240-0Hz@RG16 on connectors 28, crtc 30
freq: 39.29Hz
freq: 39.11Hz


SPI at 128MHz (not at all as garbled as I expected):

pi@pi2835:~$ od -An -vtu4 --endian=big 
/sys/bus/spi/devices/spi0.0/of_node/spi-max-frequency
   128000000

pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
28:320x240@XR24 -v
setting mode 320x240-0Hz@XR24 on connectors 28, crtc 30
freq: 48.69Hz
freq: 48.40Hz

pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
28:320x240@RG16 -v
setting mode 320x240-0Hz@RG16 on connectors 28, crtc 30
freq: 53.61Hz
freq: 54.45Hz

disabled byteswapping:
pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
28:320x240@RG16 -v
setting mode 320x240-0Hz@RG16 on connectors 28, crtc 30
freq: 63.16Hz
freq: 64.19Hz


This is how I disabled byteswapping for this test:

diff --git a/drivers/gpu/drm/tinydrm/mipi-dbi.c 
b/drivers/gpu/drm/tinydrm/mipi-dbi.c
index cb3441e51d5f..fa5d81521485 100644
--- a/drivers/gpu/drm/tinydrm/mipi-dbi.c
+++ b/drivers/gpu/drm/tinydrm/mipi-dbi.c
@@ -228,6 +228,8 @@ static int mipi_dbi_fb_dirty(struct drm_framebuffer *fb,
         DRM_DEBUG("Flushing [FB:%d] x1=%u, x2=%u, y1=%u, y2=%u\n", 
fb->base.id,
                   clip.x1, clip.x2, clip.y1, clip.y2);

+       full = true;
+       swap = false;
         if (!mipi->dc || !full || swap ||
             fb->format->format == DRM_FORMAT_XRGB8888) {
                 tr = mipi->tx_buf;


>> Almost all the time is spent in the SPI transfer, so every hz counts. On
>> the Pi there's byte swapping because the DMA capable SPI controller can't
>> do 16-bit (tinydrm_swab16()). If I remember correctly this has negligible
>> impact on performance.
>>
>> The SPI controller/driver on the Pi has some restrictions on the speeds
>> to choose from because the divisor has to be a multiple of two
>> (bcm2835_spi_transfer_one()).
> That's weird.  My specs say CDIV must be a *power* of two, with lower
> values rounded down.  I guess that means we might be running things
> fast, not slow, though (and at 32mhz out of 250, we should be getting
> the same CDIV).

Yes, that's what the datasheet says.
When fbtft was out-of-tree I distributed a custom kernel with fbtft and
Martin Sperl's DMA capable spi-bcm2708. In that version I also allowed
odd cdiv's with no ill effects reported:
https://github.com/notro/spi-bcm2708/wiki#spi-clock-divider
(see the link to the forum post for details)
Maybe the hw just ignores odd cdiv's, I don't know.

Noralf.
Noralf Trønnes Oct. 26, 2018, 8:57 p.m. UTC | #5
Den 26.10.2018 21.16, skrev Noralf Trønnes:
>
> Den 26.10.2018 04.30, skrev Eric Anholt:
>> Noralf Trønnes <noralf@tronnes.org> writes:
>>
>>> Den 25.10.2018 18.29, skrev Eric Anholt:
>>>> Eric Anholt <eric@anholt.net> writes:
>>>>
>>>>> I was going to start working on making the vc4 driver work with
>>>>> tinydrm panels, but it turned out tinydrm didn't have the panel I had
>>>>> previously bought.  So, last night I ported the fbtft staging
>>>>> driver over to DRM.
>>>>>
>>>>> It seems to work (with DT at
>>>>> https://github.com/anholt/linux/commits/drm-misc-next-hx8357d) --
>>>>> fbdev works great including rotated, and so does modetest.  However,
>>>>> when X11 comes up at 16bpp, I get:
>>>>>
>>>>> https://photos.app.goo.gl/8tuhzPFFoDGamEfk8
>>>>>
>>>>> If I have tinydrm set a preferred bpp of 24, X looks great.  Noralf,
>>>>> any ideas?
>>>> Also, with these patches and the format modifier patch I just sent, 
>>>> mesa
>>>> with vc4 is now working with this driver on this branch:
>>>>
>>>> https://gitlab.freedesktop.org/anholt/mesa/commits/kmsro
>>> Ah, nice to see this happening!
>>> Getting hw rendering was one of the advantages I saw DRM could provide
>>> over fbdev on these displays. Little did I know how complicated 
>>> graphics
>>> was outside fbdev, so I was unable to realise this myself.
>>>
>>> The current solution to get hw rendering is to have a userspace process
>>> that continously copies the framebuffer:
>>> https://github.com/tasanakorn/rpi-fbcp
>>>
>>> It's used by some of the small DIY handheld game consoles that run
>>> emulators which requires hw rendering.
>>>
>>>> Now I wonder how we can improve performance of the SPI updates.
>>> At what SPI speed are you running? The datasheet for most of these
>>> display controllers list the max speed as 10MHz, but almost all of them
>>> can go faster. Some are reported going as high as 70-80MHz. That's for
>>> the pixel data transfer, not the commands. tinydrm/mipi-dbi.c sends
>>> commands at 10MHz and pixels at full speed 
>>> (mipi_dbi_spi_cmd_max_speed()).
>>> Most panels I have run at 32MHz or 48MHz.
>> I copied the DT from the adafruit tree, which has it at 32mhz. System
>> performance seems to be limited by the copy and format conversion I
>> think -- in particular, I wonder if we shouldn't be doing our dirty
>> copies in our own workqueue.  I haven't managed to get any really good
>> profiling data yet, though.
>>
>> glxgears at 128x128 is nice and smooth, and at 480x320 it's 6fps.
>> That's not filling 32mhz of SPI.  On the other hand, I would have
>> expected the uncached reads for the 4-to-2 swapped conversion to be able
>> to go faster than 3.5mb/sec.  If it's the uncached reads, we could at
>> least use NEON for the copy to cached, and probably even do the whole
>> conversion in NEON with a bit more thought.
>>
>> Another option: use a vc4 RCL to do RGBA8888 to RGB565, since that will
>> be less pressure on the bus.  But then, I suppose I should just figure
>> out what's going on that makes X11 at RGBA8888 break, and fix that so we
>> don't even have to do that conversion.
>>
>> I keep hoping there's some way we could feed output from the DISPSLAVE
>> HVS register directly to the SPI master -- FIFO32 gets us close (two
>> 16-bit pixels packed next to each other, leftmost in the lower 2 bytes),
>> but the need for byte swapping (as opposed to R/B swapping) I think
>> makes it impossible.
>
> I just did some speed tests on a 320x240 display at 3 different speeds.
> I also tried with byteswapping disabled. Only full updates will benefit
> from passing the buffer straight through to SPI. This is because partial
> updates are copied to a transfer buffer anyways to minimize SPI transfer
> time. No need to transfer things that haven't changed and a memory copy
> is far cheaper than a SPI transfer.
>
> SPI at 48MHz:
>
> pi@pi2835:~$ od -An -vtu4 --endian=big 
> /sys/bus/spi/devices/spi0.0/of_node/spi-max-frequency
>    48000000
>
> pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
> 28:320x240@XR24 -v
> setting mode 320x240-0Hz@XR24 on connectors 28, crtc 30
> freq: 24.87Hz
> freq: 24.79Hz
>
> pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
> 28:320x240@RG16 -v
> setting mode 320x240-0Hz@RG16 on connectors 28, crtc 30
> freq: 26.33Hz
> freq: 26.30Hz
>
> disable byteswapping:
> pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
> 28:320x240@RG16 -v
> setting mode 320x240-0Hz@RG16 on connectors 28, crtc 30
> freq: 28.40Hz
> freq: 28.43Hz
>
>
> SPI at 64MHz (seems to work):
>
> pi@pi2835:~$ od -An -vtu4 --endian=big 
> /sys/bus/spi/devices/spi0.0/of_node/spi-max-frequency
>    64000000
>
> pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
> 28:320x240@XR24 -v
> setting mode 320x240-0Hz@XR24 on connectors 28, crtc 30
> freq: 32.74Hz
> freq: 32.69Hz
>
> pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
> 28:320x240@RG16 -v
> setting mode 320x240-0Hz@RG16 on connectors 28, crtc 30
> freq: 35.44Hz
> freq: 35.19Hz
>
> disabled byteswapping:
> pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
> 28:320x240@RG16 -v
> setting mode 320x240-0Hz@RG16 on connectors 28, crtc 30
> freq: 39.29Hz
> freq: 39.11Hz
>
>
> SPI at 128MHz (not at all as garbled as I expected):
>
> pi@pi2835:~$ od -An -vtu4 --endian=big 
> /sys/bus/spi/devices/spi0.0/of_node/spi-max-frequency
>   128000000
>
> pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
> 28:320x240@XR24 -v
> setting mode 320x240-0Hz@XR24 on connectors 28, crtc 30
> freq: 48.69Hz
> freq: 48.40Hz
>
> pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
> 28:320x240@RG16 -v
> setting mode 320x240-0Hz@RG16 on connectors 28, crtc 30
> freq: 53.61Hz
> freq: 54.45Hz
>
> disabled byteswapping:
> pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
> 28:320x240@RG16 -v
> setting mode 320x240-0Hz@RG16 on connectors 28, crtc 30
> freq: 63.16Hz
> freq: 64.19Hz
>
>
> This is how I disabled byteswapping for this test:
>
> diff --git a/drivers/gpu/drm/tinydrm/mipi-dbi.c 
> b/drivers/gpu/drm/tinydrm/mipi-dbi.c
> index cb3441e51d5f..fa5d81521485 100644
> --- a/drivers/gpu/drm/tinydrm/mipi-dbi.c
> +++ b/drivers/gpu/drm/tinydrm/mipi-dbi.c
> @@ -228,6 +228,8 @@ static int mipi_dbi_fb_dirty(struct 
> drm_framebuffer *fb,
>         DRM_DEBUG("Flushing [FB:%d] x1=%u, x2=%u, y1=%u, y2=%u\n", 
> fb->base.id,
>                   clip.x1, clip.x2, clip.y1, clip.y2);
>
> +       full = true;
> +       swap = false;
>         if (!mipi->dc || !full || swap ||
>             fb->format->format == DRM_FORMAT_XRGB8888) {
>                 tr = mipi->tx_buf;
>
>
>>> Almost all the time is spent in the SPI transfer, so every hz 
>>> counts. On
>>> the Pi there's byte swapping because the DMA capable SPI controller 
>>> can't
>>> do 16-bit (tinydrm_swab16()). If I remember correctly this has 
>>> negligible
>>> impact on performance.
>>>
>>> The SPI controller/driver on the Pi has some restrictions on the speeds
>>> to choose from because the divisor has to be a multiple of two
>>> (bcm2835_spi_transfer_one()).
>> That's weird.  My specs say CDIV must be a *power* of two, with lower
>> values rounded down.  I guess that means we might be running things
>> fast, not slow, though (and at 32mhz out of 250, we should be getting
>> the same CDIV).
>
> Yes, that's what the datasheet says.
> When fbtft was out-of-tree I distributed a custom kernel with fbtft and
> Martin Sperl's DMA capable spi-bcm2708. In that version I also allowed
> odd cdiv's with no ill effects reported:
> https://github.com/notro/spi-bcm2708/wiki#spi-clock-divider
> (see the link to the forum post for details)
> Maybe the hw just ignores odd cdiv's, I don't know.
>

I lifted the cdiv being odd limitation but it didn't give more speeds.
The test does demonstrate that it doesn't have to be a power of two though.

pi@pi2835:~$ od -An -vtu4 --endian=big 
/sys/bus/spi/devices/spi0.0/of_node/spi-max-frequency
    40000000
pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
28:320x240@RG16 -v
setting mode 320x240-0Hz@RG16 on connectors 28, crtc 30
freq: 26.32Hz
freq: 26.22Hz

pi@pi2835:~$ dmesg | grep cdiv
[   59.820514] bcm2835_spi_transfer_one: cdiv=7


pi@pi2835:~$ od -An -vtu4 --endian=big 
/sys/bus/spi/devices/spi0.0/of_node/spi-max-frequency
    48000000
pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
28:320x240@RG16 -v
setting mode 320x240-0Hz@RG16 on connectors 28, crtc 30
freq: 26.32Hz
freq: 26.28Hz

pi@pi2835:~$ dmesg | grep cdiv
[   59.250549] bcm2835_spi_transfer_one: cdiv=6


pi@pi2835:~$ od -An -vtu4 --endian=big 
/sys/bus/spi/devices/spi0.0/of_node/spi-max-frequency
    56000000
pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
28:320x240@RG16 -v
setting mode 320x240-0Hz@RG16 on connectors 28, crtc 30
freq: 35.33Hz
freq: 35.42Hz

pi@pi2835:~$ dmesg | grep cdiv
[   67.760747] bcm2835_spi_transfer_one: cdiv=5


pi@pi2835:~$ od -An -vtu4 --endian=big 
/sys/bus/spi/devices/spi0.0/of_node/spi-max-frequency
    64000000

pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
28:320x240@RG16 -v
setting mode 320x240-0Hz@RG16 on connectors 28, crtc 30
freq: 35.40Hz
freq: 35.35Hz

pi@pi2835:~$ dmesg | grep cdiv
[   76.061747] bcm2835_spi_transfer_one: cdiv=4


pi@pi2835:~$ od -An -vtu4 --endian=big 
/sys/bus/spi/devices/spi0.0/of_node/spi-max-frequency
    96000000

pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
28:320x240@RG16 -v
setting mode 320x240-0Hz@RG16 on connectors 28, crtc 30
freq: 54.46Hz
freq: 54.65Hz

pi@pi2835:~$ dmesg | grep cdiv
[  623.200407] bcm2835_spi_transfer_one: cdiv=3


pi@pi2835:~$ od -An -vtu4 --endian=big 
/sys/bus/spi/devices/spi0.0/of_node/spi-max-frequency
   128000000
pi@pi2835:~$ ./libdrm/tests/modetest/modetest -M mi0283qt -s 
28:320x240@RG16 -v
setting mode 320x240-0Hz@RG16 on connectors 28, crtc 30
freq: 54.43Hz
freq: 54.31Hz

pi@pi2835:~$ dmesg | grep cdiv
[   65.350713] bcm2835_spi_transfer_one: cdiv=2


I can add here that XRGB8888 was added to support things like plymouth
that only supports that one format. RGB565 is the native format supported
by the driver. These MIPI compatible controllers do support RGB888, but
there's no point in implementing it since it will increase the transfer
time by 50% due to the extra byte, so it's of little use.

Noralf.
Noralf Trønnes Oct. 31, 2018, 4:27 p.m. UTC | #6
Den 25.10.2018 18.29, skrev Eric Anholt:
> Eric Anholt <eric@anholt.net> writes:
>
>> I was going to start working on making the vc4 driver work with
>> tinydrm panels, but it turned out tinydrm didn't have the panel I had
>> previously bought.  So, last night I ported the fbtft staging
>> driver over to DRM.
>>
>> It seems to work (with DT at
>> https://github.com/anholt/linux/commits/drm-misc-next-hx8357d) --
>> fbdev works great including rotated, and so does modetest.  However,
>> when X11 comes up at 16bpp, I get:
>>
>> https://photos.app.goo.gl/8tuhzPFFoDGamEfk8
>>
>> If I have tinydrm set a preferred bpp of 24, X looks great.  Noralf,
>> any ideas?
> Also, with these patches and the format modifier patch I just sent, mesa
> with vc4 is now working with this driver on this branch:
>
> https://gitlab.freedesktop.org/anholt/mesa/commits/kmsro
>
> Now I wonder how we can improve performance of the SPI updates.

I just remembered that tinydrm does a full flush on page flips. And if
the dirtyfb ioctl is also used you end up with two flushes. This could
explain bad performance. I have been waiting for the dirtyfb through
atomic work to come to fruition, but nothing merged yet. That would give
dirty tracking on page flips.

Noralf.