diff mbox

imx-drm: Add HDMI support

Message ID 20131030190123.GM16735@n2100.arm.linux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Russell King - ARM Linux Oct. 30, 2013, 7:01 p.m. UTC
On Wed, Oct 30, 2013 at 01:01:15PM -0500, Matt Sealey wrote:
> I seem to
> remember it needs to be within 2% - 1 bit extra on the fraction and
> you're 3 times that over the range of accectable clocks monitors MUST
> accept by specification.

I think 2% may be too generous - that's what I originally tried when
fiddling around with this and it wasn't good enough in some
circumstances for my TV to produce a picture.

> On i.MX51, too, the IPU HSC can't go above 133MHz without making the
> system unstable, so any mode that needs 148MHz (most high HDMI modes)
> won't work.

That's where validating the modes you're prepared to support becomes
important, and there's hooks in DRM for doing that.  If you can't
support those with 148.5MHz dotclocks, then you have to return
MODE_CLOCK_HIGH from the connector's mode_valid callback.  That will
result in DRM pruning the modes reported by the connector to those
which the display hardware can support.

> There's no good reason to use the DI internal clock from IPU HSC,
> unless you can guarantee an EXACT rate coming out (i.e. if you need
> 64MHz, then you can get that with 128MHz/2.0 - which is fine.)

This is exactly what I'm doing on my Carrier-1 board.  If the output
can use the the internal clock (iow, if IPU_DI_CLKMODE_EXT is clear)
and the IPU clock can generate the dotclock within 1% of the requested
rate, then we set the DI to use the IPU clock and set the divisor
appropriately.

If the IPU clock can't satisfy the dot clock within 1%, I try to set
the external DI clock to the required rate, read it back and calculate
the required DI integer divisor.  Yes, I could allow some fractional
divisors here, but at the moment it's easier not to.

Each of the four cases I handle separately - and I've gotten rid of
the horrid CCF stuff in ipu-di.c: there's really no need for that
additional complication here, we're not sharing this mux and divisor
which are both internal to the DI with anything else.

Consequently, my ipu-di.c (relative to v3.12-rc7) looks like this,
and works pretty well with HDMI, providing a stable picture with
almost all mode configurations reported by my TV.

It's also fewer lines too. :)

 drivers/staging/imx-drm/ipu-v3/ipu-di.c |  328 ++++++++++---------------------
 1 files changed, 105 insertions(+), 223 deletions(-)

Comments

Matt Sealey Nov. 5, 2013, 10:39 p.m. UTC | #1
On Wed, Oct 30, 2013 at 2:01 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Oct 30, 2013 at 01:01:15PM -0500, Matt Sealey wrote:
>> I seem to
>> remember it needs to be within 2% - 1 bit extra on the fraction and
>> you're 3 times that over the range of accectable clocks monitors MUST
>> accept by specification.
>
> I think 2% may be too generous - that's what I originally tried when
> fiddling around with this and it wasn't good enough in some
> circumstances for my TV to produce a picture.

One day I *will* find the VESA spec where I read that. It is mentioned
somewhere in the HDMI spec as a reference, but the VESA doc isn't
public. There was a time I had access to it - so I have a copy. Will
figure it out...

>> On i.MX51, too, the IPU HSC can't go above 133MHz without making the
>> system unstable, so any mode that needs 148MHz (most high HDMI modes)
>> won't work.
>
> That's where validating the modes you're prepared to support becomes
> important, and there's hooks in DRM for doing that.

Nope! Not in the right place.

> If you can't support those with 148.5MHz dotclocks, then you have to return
> MODE_CLOCK_HIGH from the connector's mode_valid callback.

Here's the problem; connectors and encoders are initialized under DRM
before the display driver. For the connector to even KNOW what the
highest valid clock is, it would need to ask the lower level display
driver, since the connector and even encoder has no clue of this
limitation and shouldn't be hardcoded, or even know what the input
pixel clock capabilities are.

If you specify a mode that the SII9022 can't display at all, or the
monitor connected (via EDID information) says it can't support that
from the maximum TMDS clock field in the EDID, sure, it can send back
MODE_CLOCK_HIGH, but otherwise what needs to happen is the sii9022
driver needs to know FAR too much.

> result in DRM pruning the modes reported by the connector to those
> which the display hardware can support.

Except that the connector cannot know what the display hardware can
support, as above.

I already pointed the above out on the DRM list maybe 18 months ago,
it's a big flaw in the DRM design, and the only way I can figure
around it is to have the connector driver look for it's display
controller in the device tree and read out the clocks from there, or
just a property (max-tmds-clock?) which would hack around it,
overriding a higher value from EDID.

http://thread.gmane.org/gmane.comp.video.dri.devel/65394

>> There's no good reason to use the DI internal clock from IPU HSC,
>> unless you can guarantee an EXACT rate coming out (i.e. if you need
>> 64MHz, then you can get that with 128MHz/2.0 - which is fine.)
>
> This is exactly what I'm doing on my Carrier-1 board.  If the output
> can use the the internal clock (iow, if IPU_DI_CLKMODE_EXT is clear)
> and the IPU clock can generate the dotclock within 1% of the requested
> rate, then we set the DI to use the IPU clock and set the divisor
> appropriately.

In theory there are two things. One is that old VESA spec which says
"we expect some crappy clocks, so anything within a certain range will
work" and the CEA spec which says any mode which is specified as X MHz
as the clock rate would also be required to be displayed at
X*(1000/1001) (which covers NTSC weirdness). So in fact it may only be
0.01% variance that really matters.. 1% could be too high. It depends
if they're following the CEA spec to the letter or the VESA specs to
the letter.

One thing that always got me, and I know this is an aside, is how many
"HDMI" systems people have where they're setting the default video
mode to 1024x768. You'd think nobody ever read the CEA-861 spec, ever
- it specifically states that the ONLY mode guaranteed is 640x480. In
fact, 1024x768 is not in the list for either primary OR secondary
defined CEA modes, so there's a high chance your TV won't do it.
Monitors - sure, they may do it, they may do it because they have to
for Windows Logo certification, but TVs don't get certified by
Microsoft.

> Each of the four cases I handle separately - and I've gotten rid of
> the horrid CCF stuff in ipu-di.c: there's really no need for that
> additional complication here, we're not sharing this mux and divisor
> which are both internal to the DI with anything else.
>
> Consequently, my ipu-di.c (relative to v3.12-rc7) looks like this,
> and works pretty well with HDMI, providing a stable picture with
> almost all mode configurations reported by my TV.
>
> It's also fewer lines too. :)

True, and it looks a lot like the Freescale drivers now, but I do
think creating a clockdev clock is probably the "correct" way to do
it, especially since in debug situations you'd be able to see this -
and it's value, and it's current parent - in sysfs along with the
other clocks.

It might also help to encourage people not to just play with internal
clocks, but to create clock devices and use them inside their drivers
- tons of audio stuff has manual clock management right now, and all
this information gets lost through (IMO) needless abstraction in the
audio subsystem. I might be convinced that it's not needless, but I
would still say there's no good reason NOT to implement any clock
which has a parent which is a registered clkdev clock as a clkdev
clock itself, and expose it with the rest of the clock debug
infrastructure.

I don't think I like that fractional dividers aren't used; until
someone can come up with a great reason why not (except, I think odd
fractions dividers are not stable, so it could be just masking off the
bottom bit of the fraction is the key) and someone tests it on more
than one monitor..

I can break out my monitor testing regime, once I figure out the
i.MX51 stuff. I have a few HDMI monitors I bought over the years which
were notoriously flakey, and a few DVI ones that had real trouble
being connected to a real HDMI source at some point.

Thanks,
Matt Sealey <neko@bakuhatsu.net>
Russell King - ARM Linux Nov. 7, 2013, 5:29 p.m. UTC | #2
On Tue, Nov 05, 2013 at 04:39:40PM -0600, Matt Sealey wrote:
> On Wed, Oct 30, 2013 at 2:01 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > That's where validating the modes you're prepared to support becomes
> > important, and there's hooks in DRM for doing that.
> 
> Nope! Not in the right place.

Sure it's the right place.  It's the connector which gets to read the
modes, and the connector is part of the DRM system.

> > If you can't support those with 148.5MHz dotclocks, then you have to return
> > MODE_CLOCK_HIGH from the connector's mode_valid callback.
> 
> Here's the problem; connectors and encoders are initialized under DRM
> before the display driver. For the connector to even KNOW what the
> highest valid clock is, it would need to ask the lower level display
> driver, since the connector and even encoder has no clue of this
> limitation and shouldn't be hardcoded, or even know what the input
> pixel clock capabilities are.

That's because the way imx-drm is setup with its multitude of individual
drivers is utter madness; there's no way that it will ever move out of
drivers/staging as long as it messes around violating the internals of
DRM by playing those kinds of games.  This was discussed at some
considerable length at kernel summit, with me as the main motivator for
it - including some private discussions with various people involved.

The DRM model is a card model, just like ALSA.  In this model, the
subsystem gets initialised once, and at that point all components of
the "card" are expected to be known.  The reverse is true when tearing
down a card: the whole card goes away in one go, individual components
do not.

What this means for DRM is that all CRTCs, connectors and encoders are
registered with DRM entirely within the card-level ->load callback,
and all of those are torn down at the ->unload callback.  Between those
two points in time, nothing in the configuration of the "card" ever
changes.

The same is true of ALSA: ALSA too does not support hotplugging
individual components.  This is why we have ASoC - ASoC handles the
grouping up of individual components, works out when all components
are available, and only then does it register the "card" with ALSA.
When any of those components go away, ASoC pulls the whole card from
ALSA.

As I say, this was discussed at kernel summit.  One thing was made
abundently clear: DRM is _not_ going to support hotplugging individual
components of a card.  Moreover, what has been made perfectly clear
is that imx-drm is not going to move out of drivers/staging as long
as it abuses DRM's card model.

> I already pointed the above out on the DRM list maybe 18 months ago,
> it's a big flaw in the DRM design, and the only way I can figure
> around it is to have the connector driver look for it's display
> controller in the device tree and read out the clocks from there, or
> just a property (max-tmds-clock?) which would hack around it,
> overriding a higher value from EDID.

No, not at all.  If you build up the set of components first, before
involving DRM, then you have the full information.  I'm doing that
today: the DRM "card" can only finish initialisation when all known
connectors can find their CRTCs within the DRM card.  If any CRTC
(== IPU DIs) specified in the DT file for a "connector" (eg, HDMI
interface, lvds bridge, etc) are not present, the DRM card doesn't
initialise.

Once you have that in place, you have a way to find out the properties
of the CRTC, and you then have a way to validate the modes properly
against all parts of the display system.

> True, and it looks a lot like the Freescale drivers now,

Not at all.  The Freescale drivers use the clk API as this did, and just
as badly.  I'm afraid that by making that comment, you've just shown that
you much prefer writing long verbose emails rather than actually reading
and understanding the subject for which you're responding.

I refer you to this:

http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc/ipu3/ipu_disp.c?h=imx_3.0.35_4.1.0#n155

which clearly shows Freescale's 4.1.0 BSP using the clk API to control
the DI clock mux.

When you look at the mess that using the clk API creates in the imx-drm
driver in staging:

static int clk_di_set_rate(struct clk_hw *hw, unsigned long rate,
                                unsigned long parent_rate)
{
        struct ipu_di *di = container_of(hw, struct ipu_di, clk_hw_out);
        int div;
        u32 clkgen0;

        clkgen0 = ipu_di_read(di, DI_BS_CLKGEN0) & ~0xfff;

        div = ipu_di_clk_calc_div(parent_rate, rate);

        ipu_di_write(di, clkgen0 | div, DI_BS_CLKGEN0);

int ipu_di_init_sync_panel(struct ipu_di *di, struct ipu_di_signal_cfg *sig)
{
..
        ret = clk_set_rate(di->clk_di_pixel, round);
...
        div = ipu_di_read(di, DI_BS_CLKGEN0) & 0xfff;
        div = div / 16;         /* Now divider is integer portion */

        /* Setup pixel clock timing */
        /* Down time is half of period */
        ipu_di_write(di, (div << 16), DI_BS_CLKGEN1);

So, we call clk_set_rate(), that calls down to clk_di_set_rate() which
sets up the DI_BS_CLKGEN0 register.  We then have to read that register
layer in ipu_di_init_sync_panel() to get the divider to program the
duty cycle which we want generated.  Now, compare that to this, where
we choose which clock locally according to some rules set down by the
requirements of the attached display bridge, calculate the divider
required, and then:

        /* Set the divider */
        ipu_di_write(di, clkgen0, DI_BS_CLKGEN0);

        /*
         * Set the high/low periods.  Bits 24:16 give us the falling edge,
         * and bits 8:0 give the rising edge.  LSB is fraction, and is
         * based on the divider above.  We want a 50% duty cycle, so set
         * the falling edge to be half the divider.
         */
        ipu_di_write(di, (clkgen0 >> 4) << 16, DI_BS_CLKGEN1);

That's quite a bit simpler, and doesn't violate abstraction levels at
all.  And getting rid of this clk API crap where it's not required (a)
ends up with a better driver which actually _works_, and (b) results
in fewer lines of code:

 drivers/staging/imx-drm/ipu-v3/ipu-di.c |  328 ++++++++++---------------------
 1 files changed, 105 insertions(+), 223 deletions(-)

> but I do
> think creating a clockdev clock is probably the "correct" way to do
> it, especially since in debug situations you'd be able to see this -
> and it's value, and it's current parent - in sysfs along with the
> other clocks.

The clk API just gets in the way of making the correct decisions.
Really, it does.  We need to calculate the divider to work out what
is the right clock to use, or whether we can use a clock.  Having
that hidden behind the clk API does not give us the ability to make
that decision.

> It might also help to encourage people not to just play with internal
> clocks, but to create clock devices and use them inside their drivers

Drivers playing with their own _internal_ clocks is entirely reasonable,
and it's entirely reasonable that they need not be exposed to userspace.
Take for instance your average PC, it has lots of clocks, especially
on the video card, none of them exposed.  They've gotten away with that
for decades.  Why are we any different.

"Oh, we're embedded, we're special!" I hear you cry.  No we're not
special at all.

> I don't think I like that fractional dividers aren't used; until
> someone can come up with a great reason why not (except, I think odd
> fractions dividers are not stable, so it could be just masking off the
> bottom bit of the fraction is the key) and someone tests it on more
> than one monitor..

Why fractional dividers are bad news - this is the TMDS clock that
resulted from the DI being configured with a fractional divider:

http://www.home.arm.linux.org.uk/~rmk/cubox/IMG_1545-adj.JPG

No, that's not blur.  That's the result of a fractional divider before
the HDMI phy PLL.  The result is that the PLL is frequency modulated,
and the connected TV basically tells the IMX to "piss off".

It's not "odd fractions" that are a problem - I believe Freescale's
code is correct, the problem is that no one has _thought_ about what
this is doing:

#ifdef WTF_IS_THIS
	/*
	 * Freescale has this in their Kernel. It is neither clear what
	 * it does nor why it does it
	 */
	if (div & 0x10)
		div &= ~0x7;
	else {
		/* Round up divider if it gets us closer to desired pix clk */
		if ((div & 0xC) == 0xC) {
			div += 0x10;
			div &= ~0xF;
		}
	}
#endif

I know what that's doing, it's not rocket science.  But if you have to
resort to using fractional dividers when you have another clock available
to generate a precise clock without the inherent instability caused by
fractional dividers, you might as well make use of it.

Moreover, I'm not the only one to run into _real_ problems with fractional
dividers - other people have too with the iMX stuff, and have also had to
disable these fractional dividers too.
Matt Sealey Nov. 9, 2013, 12:23 a.m. UTC | #3
On Thu, Nov 7, 2013 at 11:29 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Nov 05, 2013 at 04:39:40PM -0600, Matt Sealey wrote:
>> On Wed, Oct 30, 2013 at 2:01 PM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>> > That's where validating the modes you're prepared to support becomes
>> > important, and there's hooks in DRM for doing that.
>>
>> Nope! Not in the right place.
>
> Sure it's the right place.  It's the connector which gets to read the
> modes, and the connector is part of the DRM system.

It's really not the right place.

>> > If you can't support those with 148.5MHz dotclocks, then you have to return
>> > MODE_CLOCK_HIGH from the connector's mode_valid callback.
>>
>> Here's the problem; connectors and encoders are initialized under DRM
>> before the display driver. For the connector to even KNOW what the
>> highest valid clock is, it would need to ask the lower level display
>> driver, since the connector and even encoder has no clue of this
>> limitation and shouldn't be hardcoded, or even know what the input
>> pixel clock capabilities are.
>
> That's because the way imx-drm is setup with its multitude of individual
> drivers is utter madness;

This is how all DRM cards are written, even the good ones. For the
connector to be able to ask the crtc "is this mode even supportable?"
it would need a mode_valid callback to call, down to the encoder (and
maybe the encoder would call crtc) and down down to the crtc. No such
callback exists for this exact reason. At the point the connector
pulls EDID modes and uses mode_valid, it's entirely possible and
nearly always true that it doesn't even HAVE an encoder or CRTC.

 there's no way that it will ever move out of
> drivers/staging as long as it messes around violating the internals of
> DRM by playing those kinds of games.  This was discussed at some
> considerable length at kernel summit, with me as the main motivator for
> it - including some private discussions with various people involved.

What it lacks, in this case, is not any functional code in those
components, but a workable model to glue those components together in
the correct order, on a per-board basis.

The appropriate card model isn't implemented - except in crazy
functions in imx-drm-core.c - the rest of it is absolutely fine.

What that doesn't fix is the underlying, unescapable fact above. At
the time the encoder or connector actually goes off and pulls EDID,
parses modes and does it's thing, at least until about 6 months ago,
it was long, long before it was attached to a CRTC.

The ONLY thing that can happen is a call to the crtc mode_fixup()
which exists only to say "go fuck yourself" to a mode_set call.
Nothing ever culls these mode lists after the connector generates
them, because it owns that object and ditching items from it is
semi-dangerous layering violation. So here is the user experience with
the current model:

* User clicks a button in GNOME to set the fanciest resolution they
can find, which is listed in their dialog
* User is told "that mode can't be set, sorry"
* User waits for you to go to sleep and then suffocates you with a pillow

And what it should be is:

* User clicks a button in GNOME to set the fanciest resolution
actually supported with this combination, because the drivers knew
enough to pre-filter invalid modes
* User always gets a valid mode set because it only ever reports valid modes

That second, consumer-oriented, model where the usable mode list is
predicated on results from the *entire* card and not just what the
monitor said, simply wasn't - and I assume still isn't - possible. Why
not? Because when a Radeon is plugged into a monitor it bloody well
supports it, and that's the end of it. People don't make displays that
modern graphics cards can't use. By the time 1080p TVs in common
circulation rolled around for consumer devices, WQXGA monitors already
existed, so desktop PC graphics cards followed suit pretty quickly.

However, some embedded devices have restrictions. I have a couple
devices at home that have a maximum resolution of 1280x720 - because
the SoC doesn't provide anything that can do more than a 75MHz pixel
clock or so. So, that sucks, but it's a real limitation of the SoC
that is essentially insurmountable.

In the case on the i.MX51, it was just never designed to be a
1080p-capable device. However, there is NOTHING in the architecture of
the chip except the maximum clock and some bus bandwidth foibles that
says it's impossible. I can run 1080p@30 and it operates great in 2D
and even does respectable 3D. The video decoders still work - if you
have a low enough bitrate/complexity movie, it *will* decode 1080p at
30fps. So artificially restricting the displays to "720p maximum" is
overly restrictive to customers, considering that 1366x768,1440x900
are well within the performance of the units. 1600x900, 1920x1080 (low
enough refresh rate) are still usable.

The problem is under DRM


> The DRM model is a card model, just like ALSA.  In this model, the
> subsystem gets initialised once, and at that point all components of
> the "card" are expected to be known.  The reverse is true when tearing
> down a card: the whole card goes away in one go, individual components
> do not.
>
> What this means for DRM is that all CRTCs, connectors and encoders are
> registered with DRM entirely within the card-level ->load callback,
> and all of those are torn down at the ->unload callback.  Between those
> two points in time, nothing in the configuration of the "card" ever
> changes.

Right but there is no card driver, and all the current "embedded" DRM
drivers get around this by making some crass assumptions about the
connections - most of them just have an LCD controller and HDMI
controller built-in so they haven't trampled over the case where the
device tree allowed an i2c device to probe and initialize and somehow
needs discovery to glue into the framework as an encoder. The "encoder
slave" system right now is totally futzed, while it would be a GREAT
point to implement every encoder-connector combination that card
drivers need to pick up, it doesn't have the right functionality.

The reason there's no card driver.. discussed below..

> The same is true of ALSA: ALSA too does not support hotplugging
> individual components.  This is why we have ASoC - ASoC handles the
> grouping up of individual components, works out when all components
> are available, and only then does it register the "card" with ALSA.
> When any of those components go away, ASoC pulls the whole card from
> ALSA.

There's a distinct lack of actual procedure and process standardized
for collecting card data from device tree, and in the case of a single
SoC, the current ALSA/DRM model will be one driver for every board,
and every board needs it's own compatible property, and every board
does *the exact same thing*. You can see this with the current
imx{51|53|6x}-board-{codec} driver model. Every single driver is doing
almost identical stuff, where they share codecs they are doing
ABSOLUTELY identical stuff. There is stuff in the card level driver
that doesn't even do anything at the card level (and the card level
always calls down for this functionality to the codec level).

If device trees were adopted with the intent to allow removal of the
"board setup code written in C and glommed in a directory with the SoC
support code", implementing "card support" for every board is just
moving that crappy platform code from one directory to another with
absolutely zero benefit or flexibility or ability to expose a common
configuration.

What happened with ALSA is the same shit we're going to get with DRM.

> No, not at all.  If you build up the set of components first, before
> involving DRM, then you have the full information.  I'm doing that
> today: the DRM "card" can only finish initialisation when all known
> connectors can find their CRTCs within the DRM card.  If any CRTC
> (== IPU DIs) specified in the DT file for a "connector" (eg, HDMI
> interface, lvds bridge, etc) are not present, the DRM card doesn't
> initialise.

And this is what's missing. So, here's the question: how do you
implement a card-level driver model that marshalls this information
together and submits it to the separate components without, in the
process, creating a whole new driver framework? The need for having
the Connector ask the CRTC what a valid mode is doesn't make sense
here, in that DRM doesn't have such a framework to do such a thing
(reasons above). In the i.MX51 case, the maximum display clock is
*absolutely not* a function of the connector, but it must know this
information in the same way that it would also need to know the
maximum TMDS output clock for the encoder (bzzt - sorry, sir, it
doesn't) so that it can validate the modes, and in the way it knows to
gather the maximum supported TMDS clock from the monitor EDID (if
present).

> Once you have that in place, you have a way to find out the properties
> of the CRTC, and you then have a way to validate the modes properly
> against all parts of the display system.

Right, and your card driver "isn't DRM" in the same way as DRM has a
card model - it operates on the level of a PCIe device driver that
probes and magically sets up *all* it's subordinate buses, encoders,
connectors.

You can do that with i.MX6 HDMI because the HDMI is built into the SoC
- if you added a node for it and added a cross-reference to it, then
it's fine.

What happens in the case where you defined an HDMI transmitter as an
external I2C device - it's connected on this bus, Linux already probed
the thing. Your card model would have to go out and find this i2c
device somehow, and there's not a great way of doing it. What if the
EDID can't be obtained by straight I2C and you have to set some weird
bits inside the transmitter to access the DDC bus? That requires an
API that doesn't exist, and any topology-finding drivers that are
built off the single use case of "we just attach a panel direct to the
SoC by some parallel or differential link" rather than the
intermediate-transmitters case is going to miss out the finer points
of the intermediate-transmitters case.

>> True, and it looks a lot like the Freescale drivers now,
>
> Not at all.  The Freescale drivers use the clk API as this did, and just
> as badly.  I'm afraid that by making that comment, you've just shown that
> you much prefer writing long verbose emails rather than actually reading
> and understanding the subject for which you're responding.
>
> I refer you to this:
>
> http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc/ipu3/ipu_disp.c?h=imx_3.0.35_4.1.0#n155
>
> which clearly shows Freescale's 4.1.0 BSP using the clk API to control
> the DI clock mux.

I understand THAT driver like the back of my hand..

>> think creating a clockdev clock is probably the "correct" way to do
>> it, especially since in debug situations you'd be able to see this -
>> and it's value, and it's current parent - in sysfs along with the
>> other clocks.
>
> The clk API just gets in the way of making the correct decisions.
> Really, it does.  We need to calculate the divider to work out what
> is the right clock to use, or whether we can use a clock.  Having
> that hidden behind the clk API does not give us the ability to make
> that decision.

It shouldn't ever have been hidden anyway, someone completely
over-abstracted the clock API at some point by making it a
query/get/set interface for a generic clock object, where the only
reasonable values are the rate and, in a limited fashion, the parent.

Why not add clk_get_divider() which essentially returns 0 for anything
that's not actually a real divider clock?

>> It might also help to encourage people not to just play with internal
>> clocks, but to create clock devices and use them inside their drivers
>
> Drivers playing with their own _internal_ clocks is entirely reasonable,
> and it's entirely reasonable that they need not be exposed to userspace.
> Take for instance your average PC, it has lots of clocks, especially
> on the video card, none of them exposed.  They've gotten away with that
> for decades.  Why are we any different.
>
> "Oh, we're embedded, we're special!" I hear you cry.  No we're not
> special at all.

No, we're not special in that there's no functional difference, but
the real situation is that as a developer it's REALLY difficult to get
the information you want from some internal clock somewhere compared
to a core SoC clock inside the clkdev framework, and every internal
clock needs it's own magic, custom implementation - which just makes
maintaining it more work since 'everyone' then has to learn not only
how to derive operation from a generic clkdev device, but how the
actual implementation of the internal clock works as well underneath
it.

Which is what clkdev was supposed to get rid of, right? What it does
is make simple cases much easier.

I will agree with you that in this case, this is totally screwy, but I
don't want to see it discourage people from still doing it... that
said........

>> I don't think I like that fractional dividers aren't used; until
>> someone can come up with a great reason why not (except, I think odd
>> fractions dividers are not stable, so it could be just masking off the
>> bottom bit of the fraction is the key) and someone tests it on more
>> than one monitor..
>
> Why fractional dividers are bad news - this is the TMDS clock that
> resulted from the DI being configured with a fractional divider:
>
> http://www.home.arm.linux.org.uk/~rmk/cubox/IMG_1545-adj.JPG
>
> No, that's not blur.  That's the result of a fractional divider before
> the HDMI phy PLL.  The result is that the PLL is frequency modulated,
> and the connected TV basically tells the IMX to "piss off".

Understood.

> It's not "odd fractions" that are a problem

Maybe I should have made myself a bit clearer.. the further away from
0.0 or 0.5 fractional part you get, the less it works, and the further
up from 0.5 the less reliable it is regardless.

There are some odd (as in oddball, not the opposite to even) fractions
that don't make any sense and I found I got more monitors to work if I
just masked off the bottom bit of the divider or bottom two, but that
broke a couple monitors more than it fixed.

What the effect for me was, if I quit using the fractional divider
completely, that it would always get caught by the "clock within a
reasonable distance of the requested rate" catch in many cases, and
then we ignore it and use an external clock, and the amount of testing
someone can do with only 11 monitors, limited spare units, and not a
lot of time to spend is limited. Which you propose later.. who gives a
shit if we can't use fractional rates? Just set the clock right!

There's a funky reason why dropping the divider actually works, or why
probably masking off certain parts of it for certain modes works..

> - I believe Freescale's code is correct

I disagree with that and I'll explain why :)

> the problem is that no one has _thought_ about what this is doing:

I disagree with that since I was thinking about it in 2009.. I just
can't figure out EXACTLY what the real fix is, I never had enough
hardware, time, or patience.

I totally agree with the assessment Alexander had, and I trust your
scopes. A screwy clock gets generated. I don't think it's because of
the "fractional part" though.

They've been doing odd tricks, since the earliest BSP I could find,
and I started working on it at this BSP version;

        http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc/ipu3/ipu_disp.c?h=imx_2.6.28#n865

Which was modified at this commit, going further back:

        http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/commit/drivers/mxc/ipu3/ipu_disp.c?h=imx_2.6.28&id=1c930f0b2f1a67c7eb46a045c7517376e8ecaa32

I only fiddled with this thing for the shortest time before moving to
2.6.31 as fast as possible; the code reappears in later BSPs just
ported in with no git history... but here, note that set_rate sets
DI_BS_CLKGEN1, too. There's not really a good reason why it WOULDN'T.

        http://git.freescale.com/git/cgit.cgi/imx/linux-2.6-imx.git/tree/drivers/mxc/ipu3/ipu_common.c?h=imx_2.6.31_10.08.01#n151

And we get to more modern times, and DI_BS_CLKGEN1 is not in the clk API code.

The rounding essentially says you can only have a fraction of 1/2 if
your clock divider integer part is odd. If your clock divider integer
part is even, and the fractional part has a particular set of values
(0.75, 0.8125, 0.875, 0.9375), it'll bump it to the next
(odd-integer-part), unfractioned divider. I never got how that even
worked, but it is, in my opinion, somewhat to do with this gem in
setting DI_BS_CLKGEN1:

        div = ipu_di_read(di, DI_BS_CLKGEN0) & 0xfff;
        div = div / 16; /* Now divider is integer portion */

        /* Setup pixel clock timing */
        /* Down time is half of period */
        ipu_di_write(di, (div << 16), DI_BS_CLKGEN1); <-- right here

Recall that the original code (1c930f0b2f1a) also set the "up" portion here.

If you will notice, it writes the integer part of the divider (so in
Alexanders case, 0x3) to the "down period" component of the second
clock generator register by shifting the entire thing up 16 bits.

According to the docs, DI_BS_CLKGEN1[16] is ANOTHER FRACTION BIT (as
is DI_BS_CLKGEN1[0] for the "up period").

My theory that I could never confirm, is that DI_BS_CLKGEN1 has always
been misprogrammed because it's off-by-one on the actual integer part
of the divider, and there's probably a case for setting the delay
period (top of DI_BS_CLKGEN0) and the "up period" of DI_BS_CLKGEN1
too. I just never had the amount of hardware I'd need to do a
functional test with real devices that gave us those clock dividers.

In Alex's case with his 3.5 divider, the down period is being set to
1.5 (i.e. (0x38 >> 4) <<16) in that bit position. With a 3.0 divider,
dropping 1.5 in the down period is sorta correct. Clock works. Yay?

It also *ALWAYS* works in the external clock case, since the external
clock (as per Freescale) is always used as parent, generating the DI
clock with an exact divider of 2, 4, or 8.

Does that maybe go so far as to explain why those fractional dividers
really don't work properly?

There's a REALLY good explanation in the i.MX6QDRM manual around
"34.7.10.3 Timing generator" which gives exactly how this clock is set
up. What you're expecting as a straight fractional division of a clock
isn't true - every pin output on the DI is derived from a fundamental
timebase which is defined by that fractional divider in DI_BS_CLKGEN0.
What actually goes to the display is THEN derived from that
fundamental timebase modified by the valueS (plural!) in
DI_BS_CLKGEN1. There's a significant amount of math to figure out
whether the exact display timing is going to be valid, not being done
here, but it turns out through pure luck, it tends to work out for a
couple cases..

I never sat down and thought of the actual math required and since it
impacts ALL the waveform generation too, it seemed like a lot of work
I wasn't prepared to do at the time. I don't think I was paid enough
for it, when I was being paid to do it. I don't get paid to do it at
all right now (thank Glob) and nobody's giving me any free hardware
that makes me want to do it... so it's in your court.

> I know what that's doing, it's not rocket science.  But if you have to
> resort to using fractional dividers when you have another clock available
> to generate a precise clock without the inherent instability caused by
> fractional dividers

If DI_BS_CLKGEN1 can be proven mis-programmed and fixing that fixes
the instability, then awesome. I'd like to see that investigated
though.

If not, sure, I agree with you 100%. There's no reason

> Moreover, I'm not the only one to run into _real_ problems with fractional
> dividers - other people have too with the iMX stuff, and have also had to
> disable these fractional dividers too.

Well I've been dealing with them since the i.MX51 taped out and
honestly.. it worked a lot better with the 2.6.26 kernel with the
original hacks, but at that time I don't recall we had any support for
an external clock. As soon as 2.6.28 rolled round we found some weird
problems, and Freescale added code to use external clocks and we
stopped caring because it pretty much always worked, with some
veeerrrrrry strange caveats - most of which were masked by API
deficiencies in the framebuffer subsystem and a distinct inability to
ask the IPU driver what clock it actually set from the code in our
transmitter driver.

Which is, back to an earlier point, *why I want real clkdevs internal
to drivers* since it makes it a lot easier to pass the information
around in a standardized way. The transmitter may need to know
*precisely* what pixel clock was set, and at the encoder level what I
can get back is clk_round_rate rounded by the crtc to the exact clock
it's outputting, rather than recalculating the clock from the mode
information (syncs+totals+blah, you patched this out in your fixes)
and not being able to take into account the rounding EXCEPT if I add
my own API.. to fetch the information. It's more reasonable if the
encoder can just request the crtc's clk handle and do what it may,
than for a special API for passing "mode clock information" came into
it.

What we've not ascertained in reality is not "does it generate a weird
clock" but why did Freescale only limit only those very specific
fractions (why would (if (div & 0xc)==0xc) div+=0x1; div &= 0xf be
"working" for them?) and why don't they set the "up period" in their
driver since 2009? And what happens if we really, really properly
configure the clock generation?

Matt Sealey <neko@bakuhatsu.net>
Russell King - ARM Linux Nov. 9, 2013, 12:53 a.m. UTC | #4
On Fri, Nov 08, 2013 at 06:23:37PM -0600, Matt Sealey wrote:
> On Thu, Nov 7, 2013 at 11:29 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Tue, Nov 05, 2013 at 04:39:40PM -0600, Matt Sealey wrote:
> >> On Wed, Oct 30, 2013 at 2:01 PM, Russell King - ARM Linux
> >> <linux@arm.linux.org.uk> wrote:
> >> > That's where validating the modes you're prepared to support becomes
> >> > important, and there's hooks in DRM for doing that.
> >>
> >> Nope! Not in the right place.
> >
> > Sure it's the right place.  It's the connector which gets to read the
> > modes, and the connector is part of the DRM system.
> 
> It's really not the right place.
> 
> >> > If you can't support those with 148.5MHz dotclocks, then you have to return
> >> > MODE_CLOCK_HIGH from the connector's mode_valid callback.
> >>
> >> Here's the problem; connectors and encoders are initialized under DRM
> >> before the display driver. For the connector to even KNOW what the
> >> highest valid clock is, it would need to ask the lower level display
> >> driver, since the connector and even encoder has no clue of this
> >> limitation and shouldn't be hardcoded, or even know what the input
> >> pixel clock capabilities are.
> >
> > That's because the way imx-drm is setup with its multitude of individual
> > drivers is utter madness;
> 
> This is how all DRM cards are written, even the good ones. For the
> connector to be able to ask the crtc "is this mode even supportable?"
> it would need a mode_valid callback to call, down to the encoder (and
> maybe the encoder would call crtc) and down down to the crtc. No such
> callback exists for this exact reason. At the point the connector
> pulls EDID modes and uses mode_valid, it's entirely possible and
> nearly always true that it doesn't even HAVE an encoder or CRTC.

Now look at the bigger picture.  What makes the decision about whether
to use a mode with a particular CRTC?  Is it:

(a) DRM
(b) Userspace

The answer is not (a) - DRM just does what userspace instructs.  What
this means is that you can't do per-CRTC validation of mode settings
without big changes to the API, which aren't going to happen.

So the best you can do is to limit the displayed modes according to
what the hardware in general is able to do.

Luckily, with imx-drm, in the case of single IPUs, both "crtcs" have
exactly the same capability as far as clocks go - which is common with
every other DRM system so far.  So really, there's no need to know which
CRTC is going to be associated with the connector - we already know the
capabilities there.

> What it lacks, in this case, is not any functional code in those
> components, but a workable model to glue those components together in
> the correct order, on a per-board basis.
> 
> The appropriate card model isn't implemented - except in crazy
> functions in imx-drm-core.c - the rest of it is absolutely fine.

Well, I've actually fixed the problem *right* *now*.

> The ONLY thing that can happen is a call to the crtc mode_fixup()
> which exists only to say "go fuck yourself" to a mode_set call.
> Nothing ever culls these mode lists after the connector generates
> them, because it owns that object and ditching items from it is
> semi-dangerous layering violation. So here is the user experience with
> the current model:
> 
> * User clicks a button in GNOME to set the fanciest resolution they
> can find, which is listed in their dialog
> * User is told "that mode can't be set, sorry"
> * User waits for you to go to sleep and then suffocates you with a pillow
> 
> And what it should be is:
> 
> * User clicks a button in GNOME to set the fanciest resolution
> actually supported with this combination, because the drivers knew
> enough to pre-filter invalid modes
> * User always gets a valid mode set because it only ever reports valid modes
> 
> That second, consumer-oriented, model where the usable mode list is
> predicated on results from the *entire* card and not just what the
> monitor said, simply wasn't - and I assume still isn't - possible. Why
> not? Because when a Radeon is plugged into a monitor it bloody well
> supports it, and that's the end of it. People don't make displays that
> modern graphics cards can't use. By the time 1080p TVs in common
> circulation rolled around for consumer devices, WQXGA monitors already
> existed, so desktop PC graphics cards followed suit pretty quickly.

Sorry, no, I'm not buying your arguments here - you may be right but
your apparant bias towards "oh, DRM was written for Radeon" is soo
wrong.  Maybe you should consider that the DRM maintainer works for
Intel, who are a competitor of AMD not only in the CPU market but also
the video market too.

However, even so.  Getting back to the hardware we have, which is
imx-drm, there is no difference between the capabilities of the two
"CRTCs" in a single IPU as far as mode pixel rates are concerned.
There _may_ be a difference between the two IPUs on IMX6Q, but not
within a single IPU.

> However, some embedded devices have restrictions. I have a couple
> devices at home that have a maximum resolution of 1280x720 - because
> the SoC doesn't provide anything that can do more than a 75MHz pixel
> clock or so. So, that sucks, but it's a real limitation of the SoC
> that is essentially insurmountable.

Right, so turning everything into micro-components is a bad idea.
That's already been said (I think I've already said that about imx-drm.)

> In the case on the i.MX51, it was just never designed to be a
> 1080p-capable device. However, there is NOTHING in the architecture of
> the chip except the maximum clock and some bus bandwidth foibles that
> says it's impossible. I can run 1080p@30 and it operates great in 2D
> and even does respectable 3D. The video decoders still work - if you
> have a low enough bitrate/complexity movie, it *will* decode 1080p at
> 30fps. So artificially restricting the displays to "720p maximum" is
> overly restrictive to customers, considering that 1366x768,1440x900
> are well within the performance of the units. 1600x900, 1920x1080 (low
> enough refresh rate) are still usable.
> 
> The problem is under DRM

Err no.  The reason 1080p@30 works is because it uses the same pixel
rate as 1280x720.  74.25MHz.

So the problem is with your thinking.  What you're thinking is "it won't
do 1080p so we have to deny modes based on the resolution".  There's
absolutely no problem what so ever restricting the set of available
modes based on _pixel_ _clock_ _rate_ in DRM.  DRM fully supports this.
There is no problem here.

So you _can_ have 1080p at 30Hz if you have a bandwidth limitation.
All it takes is for the _correct_ limitations to be imposed rather than
the artificial ones you seem to be making up.

And at this point I give up reading your diatribe.  You've been told
many times in the past not to write huge long essays as emails, because
frankly people won't read them.  It's 1 am, I'm not going to spend
another hour or more reading whatever you've written below this point
because its another 300 lines I just don't have time to read at this.

If you'd like to rephrase in a more concise manner then I may read
your message.
Matt Sealey Nov. 9, 2013, 6:11 a.m. UTC | #5
On Fri, Nov 8, 2013 at 6:53 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Fri, Nov 08, 2013 at 06:23:37PM -0600, Matt Sealey wrote:
>> On Thu, Nov 7, 2013 at 11:29 AM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>
> Now look at the bigger picture.  What makes the decision about whether
> to use a mode with a particular CRTC?  Is it:
>
> (a) DRM
> (b) Userspace
>
> The answer is not (a) - DRM just does what userspace instructs.  What
> this means is that you can't do per-CRTC validation of mode settings
> without big changes to the API, which aren't going to happen.

That's exactly what I was saying in the first place, and what Alex
Duecher said to me a year ago after I spent 2 days straight crashing
an i.MX51 trying to figure out a way to get the connector to ask the
CRTC for the right information.

So I accept that we can't do validation of modes at a CRTC level, but
what's the solution? I don't accept that it would be to shove mode
culling policy to userspace.

xorg-video-modesetting driver is a gold standard - if all users want
is to get a linear framebuffer with a particular mode (and we should
all be happy for this on ARM since it means KMS is working, horray!),
it should not have to know which modes will actually work or not or be
modified to be platform specific.

In the case of "card drivers" for DRM,  on the kernel side, having a
"card" driver per board to micromanage settings will get as unwieldy
as having multiple platform-specific xorg-video-modesetting patches,
when there are 10 or 20 boards based on a particular chip in mainline,
and 20 or 30 SoCs supported in active use. ALSA is getting away with
it right now because barely anyone has gotten to the point of having
working audio with device tree except the i.MX6 and i.MX5 cases. It
won't scale as it goes forward.

I am betting the one you "have *right* *now*" is the above, you wrote
a driver which, given a very specific set of device tree entries with
specific compatible properties in combination, specifically
initializes several i.MX6-specific components together in the right
order. This also won't scale, going forward.

> Luckily, with imx-drm, in the case of single IPUs, both "crtcs" have
> exactly the same capability as far as clocks go - which is common with
> every other DRM system so far.  So really, there's no need to know which
> CRTC is going to be associated with the connector - we already know the
> capabilities there.

That's not the point I was trying to get across.

> If you'd like to rephrase in a more concise manner then I may read
> your message.

Maybe I am just not explaining it well enough. Here it is in the
shortest way I can do it.

* i.MX6Q - two discrete IPU with two DI. IPU have independent clock,
up to 200MHz for HSC.
* i.MX53/50 - one discrete IPU with two DI. IPU clock up to 150MHz for HSC.
* i.MX51 - one discrete IPU with two DI. IPU clock up to 133MHz for HSC.

Same driver.

DI can divide HSC clock or accept external clock to generate, using a
fractional divider, any clock rate it wants. This clock becomes the
"fundamental timebase" for the DISP_CLK, which is generated as
expected, based on "up" and "down" period values for the fundamental
timebase.

Whether HSC clock or external clock, the maximum DISP_CLK which goes
out to the display is bounded by the current HSC clock rate.

Two problems.

* i.MX51 maximum clock rate (133MHz) is lower than standard pixel
clock rate for 1080p60 and other high-resolution modes with
high-refresh rates

* DI_BS_CLKGEN0 and 1 need to be programmed correctly to generate a
correct DISP_CLK out.

Problem without solution (i.e. DRM filters modes back-asswards):

* i.MX51 connector is probably not part of the SoC, but an independent
and totally divorced component from the IPU. One example is an IPU
parallel display connected to an active HDMI encoder, which may be
provided by companies such as Texas Instruments or Silicon Image.
These things exist in the wild. It is not reasonable for a Silicon
Image transmitter driver as a generic i2c device (which it really is)
to have intimate knowledge of the i.MX51. It is not reasonable for the
Silicon Image transmitter to be predicated upon a specific SoC.

DRM offers no way for an encoder/connector to find out this
information from the correct place - the CRTC driver - at the time it
fetches EDID from the monitor, or gains mode information some other
way. DRM only offers a way to use the maximum TMDS clock from the EDID
to cull modes the *monitor* can't handle in the CVT/GTF mode timing
case (or disable usage of CVT/GTF completely).

Problem with solution (i.e. don't just strip fractions please):

Current programming methodology for DI_BS_CLKGEN1 only takes into
account exact case where DI_BS_CLKGEN0 divider is an integer and is
stuffed into CLKGEN1 "down period" shifted one bit right, which is why
the "strip the fractions" part works.

CLKGEN1 only has a Q8.1 fractional divider whereas CLKGEN0 has a Q12.4
fractional divider.

Any use of anything but the integer part of the CLKGEN0 divider
probably cannot be represented as an exact value in CLKGEN1 without
the original parent clock being multiplied by a suitable value or
CLKGEN0 divider (Q12.4) being further adjusted to allow valid CLKGEN1
values (Q8.1).

Using CLKGEN1 better and doing more comprehensive 'upstream' clock
management may give better results. In the configurations Alex and
yourself tried, it is not possible to derive the correct DISP_CLK from
the fundamental timebase created by CLKGEN0 using the values CLKGEN1
was programmed with. Previous mail, a GEN0 divider of 3.5 ends up as a
GEN1 down period of 1.5. A divider of 3.0 ends up as GEN1 down period
of 1.5.

In this case the solution is to correct the fundamental timebase such
that it is possible to generate that value. Assuming input clock
133MHz, expected rate 38MHz. Simplest case divider is 133/38 = 3.5.
With current code, "down period" will be 1.5 instead of (3/2) but it
should be 1.75 which we cannot represent because of the single
fraction bit.

Possible fix? Set divider to 1.75 instead. DI clock generator will
create a 76MHz clock as fundamental timebase. This *shouldn't* go
directly out on any pins but it's used to synchronize everything else
on the waveform generators. What I could never work out is what to do
next; set GEN1 down period to 3.5? Or 2.0? The diagrams don't explain
whether the period is the relation from timebase to parent clock, or
timebase to DISP_CLK very clearly, bit it seems 2.0 would be the
correct one in this instance (it makes more sense given the current
working divider/down period settings).

Here's my notes from sometime in 2010:

"When the divider is calculated for the pixel clock, what it should be
trying to generate is a Q8.1 number for the "down" and/or "up" periods
in GEN1 (you could just set one of them, dependent on clock polarity
setting, assuming you want your data clocked at the edge of the input
clock.. all stuff the driver kinda glosses over) rather than
concentrating on the fractional divider in GEN0 first. The Q12.4
divider is there so you have more flexibility when you find a perfect
Q8.1 for the up/down clock periods depending on the input clock."

Apart from that contrived case above, I used Alex's case, I couldn't
sit down and write that algorithm even if I had all the coffee in
Columbia (and some other stuff too), I've had 3 years to try and just
didn't. I'm dealing with getting the platform that explains the cases
I have up and running in the first place, dealing with *all* the same
problems you are with the added problem of sometimes ARM not work. I'm
not rushing to get it upstream by a deadline - sometimes my wife
complains that I spent too much time in my office after work - so when
she lets me be, and it actually boots reliably and I figure out a
couple U-Boot problems, maybe we'll get in sync on the issue and be
fixing the same things in collaboration, but until then... I'm
suggesting you might want to do it as it might cause less weird
results later down the road. Of course if you TL;DR it, then you can
live with the broken display driver for as long as you like. Concise?

Thanks :)

Matt Sealey <neko@bakuhatsu.net>
Russell King - ARM Linux Nov. 9, 2013, 10:28 a.m. UTC | #6
On Sat, Nov 09, 2013 at 12:11:51AM -0600, Matt Sealey wrote:
> That's exactly what I was saying in the first place, and what Alex
> Duecher said to me a year ago after I spent 2 days straight crashing
> an i.MX51 trying to figure out a way to get the connector to ask the
> CRTC for the right information.

I still completely disagree with you on that.  I've yet to see a
situation where you have different restrictions on what can be done
depending on which "CRTC" in the SoC you connect an interface to.

> In the case of "card drivers" for DRM,  on the kernel side, having a
> "card" driver per board to micromanage settings will get as unwieldy
> as having multiple platform-specific xorg-video-modesetting patches,

*sigh* completely the wrong end of the stick, and I bet most of the rest
of this email is pounded out based on that.

> I am betting the one you "have *right* *now*" is the above, you wrote
> a driver which, given a very specific set of device tree entries with
> specific compatible properties in combination, specifically
> initializes several i.MX6-specific components together in the right
> order. This also won't scale, going forward.

What utter rubbish.  I have nothing further to add other than you are
wrong.  I'm not going to give you any further information for you to
pound your poor keyboard with to compose another verbose reply which
has the wrong end of the stick.

> > If you'd like to rephrase in a more concise manner then I may read
> > your message.
> 
> Maybe I am just not explaining it well enough. Here it is in the
> shortest way I can do it.
> 
> * i.MX6Q - two discrete IPU with two DI. IPU have independent clock,
> up to 200MHz for HSC.
> * i.MX53/50 - one discrete IPU with two DI. IPU clock up to 150MHz for HSC.
> * i.MX51 - one discrete IPU with two DI. IPU clock up to 133MHz for HSC.
> 
> Same driver.

Right, so what you've just told me in a concise way is that the
restrictions are per-SoC, not per-CRTC as you've just been claiming in
your rediculously verbose emails (the rest of your email cut without
reading.)

So what we do is we arrange for the imx-drm connector layers to call
into the imx-drm-core layer, to ask the imx-drm CRTC layer what the
maximum pixel clock is.

We then have the connector compare the maximum pixel clock rate with
the clock rate required for the mode, and if the mode has a pixel
clock rate greater than the CRTC can provide, we return MODE_CLOCK_HIGH.

If done correctly, this doesn't require any code in the connector
layer: you arrange for imx-drm-core to provide a _generic_ cross-
connector function which provides this information, and you point
the connector's .mode_valid at that generic function.

There is no problem here.
diff mbox

Patch

diff --git a/drivers/staging/imx-drm/ipu-v3/ipu-di.c b/drivers/staging/imx-drm/ipu-v3/ipu-di.c
index d766e18bfca0..82a9ebad697c 100644
--- a/drivers/staging/imx-drm/ipu-v3/ipu-di.c
+++ b/drivers/staging/imx-drm/ipu-v3/ipu-di.c
@@ -19,9 +19,6 @@ 
 #include <linux/io.h>
 #include <linux/err.h>
 #include <linux/platform_device.h>
-#include <linux/clk.h>
-#include <linux/clk-provider.h>
-#include <linux/clkdev.h>
 
 #include "imx-ipu-v3.h"
 #include "ipu-prv.h"
@@ -33,10 +30,7 @@  struct ipu_di {
 	struct clk *clk_di;	/* display input clock */
 	struct clk *clk_ipu;	/* IPU bus clock */
 	struct clk *clk_di_pixel; /* resulting pixel clock */
-	struct clk_hw clk_hw_out;
-	char *clk_name;
 	bool inuse;
-	unsigned long clkflags;
 	struct ipu_soc *ipu;
 };
 
@@ -141,130 +135,6 @@  static inline void ipu_di_write(struct ipu_di *di, u32 value, unsigned offset)
 	writel(value, di->base + offset);
 }
 
-static int ipu_di_clk_calc_div(unsigned long inrate, unsigned long outrate)
-{
-	u64 tmp = inrate;
-	int div;
-
-	tmp *= 16;
-
-	do_div(tmp, outrate);
-
-	div = tmp;
-
-	if (div < 0x10)
-		div = 0x10;
-
-#ifdef WTF_IS_THIS
-	/*
-	 * Freescale has this in their Kernel. It is neither clear what
-	 * it does nor why it does it
-	 */
-	if (div & 0x10)
-		div &= ~0x7;
-	else {
-		/* Round up divider if it gets us closer to desired pix clk */
-		if ((div & 0xC) == 0xC) {
-			div += 0x10;
-			div &= ~0xF;
-		}
-	}
-#endif
-	return div;
-}
-
-static unsigned long clk_di_recalc_rate(struct clk_hw *hw,
-		unsigned long parent_rate)
-{
-	struct ipu_di *di = container_of(hw, struct ipu_di, clk_hw_out);
-	unsigned long outrate;
-	u32 div = ipu_di_read(di, DI_BS_CLKGEN0);
-
-	if (div < 0x10)
-		div = 0x10;
-
-	outrate = (parent_rate / div) * 16;
-
-	return outrate;
-}
-
-static long clk_di_round_rate(struct clk_hw *hw, unsigned long rate,
-				unsigned long *prate)
-{
-	struct ipu_di *di = container_of(hw, struct ipu_di, clk_hw_out);
-	unsigned long outrate;
-	int div;
-	u32 val;
-
-	div = ipu_di_clk_calc_div(*prate, rate);
-
-	outrate = (*prate / div) * 16;
-
-	val = ipu_di_read(di, DI_GENERAL);
-
-	if (!(val & DI_GEN_DI_CLK_EXT) && outrate > *prate / 2)
-		outrate = *prate / 2;
-
-	dev_dbg(di->ipu->dev,
-		"%s: inrate: %ld div: 0x%08x outrate: %ld wanted: %ld\n",
-			__func__, *prate, div, outrate, rate);
-
-	return outrate;
-}
-
-static int clk_di_set_rate(struct clk_hw *hw, unsigned long rate,
-				unsigned long parent_rate)
-{
-	struct ipu_di *di = container_of(hw, struct ipu_di, clk_hw_out);
-	int div;
-	u32 clkgen0;
-
-	clkgen0 = ipu_di_read(di, DI_BS_CLKGEN0) & ~0xfff;
-
-	div = ipu_di_clk_calc_div(parent_rate, rate);
-
-	ipu_di_write(di, clkgen0 | div, DI_BS_CLKGEN0);
-
-	dev_dbg(di->ipu->dev, "%s: inrate: %ld desired: %ld div: 0x%08x\n",
-			__func__, parent_rate, rate, div);
-	return 0;
-}
-
-static u8 clk_di_get_parent(struct clk_hw *hw)
-{
-	struct ipu_di *di = container_of(hw, struct ipu_di, clk_hw_out);
-	u32 val;
-
-	val = ipu_di_read(di, DI_GENERAL);
-
-	return val & DI_GEN_DI_CLK_EXT ? 1 : 0;
-}
-
-static int clk_di_set_parent(struct clk_hw *hw, u8 index)
-{
-	struct ipu_di *di = container_of(hw, struct ipu_di, clk_hw_out);
-	u32 val;
-
-	val = ipu_di_read(di, DI_GENERAL);
-
-	if (index)
-		val |= DI_GEN_DI_CLK_EXT;
-	else
-		val &= ~DI_GEN_DI_CLK_EXT;
-
-	ipu_di_write(di, val, DI_GENERAL);
-
-	return 0;
-}
-
-static struct clk_ops clk_di_ops = {
-	.round_rate = clk_di_round_rate,
-	.set_rate = clk_di_set_rate,
-	.recalc_rate = clk_di_recalc_rate,
-	.set_parent = clk_di_set_parent,
-	.get_parent = clk_di_get_parent,
-};
-
 static void ipu_di_data_wave_config(struct ipu_di *di,
 				     int wave_gen,
 				     int access_size, int component_size)
@@ -528,42 +398,58 @@  static void ipu_di_sync_config_noninterlaced(struct ipu_di *di,
 		ipu_di_sync_config(di, cfg_vga, 0, ARRAY_SIZE(cfg_vga));
 }
 
-int ipu_di_init_sync_panel(struct ipu_di *di, struct ipu_di_signal_cfg *sig)
+static void ipu_di_config_clock(struct ipu_di *di,
+	const struct ipu_di_signal_cfg *sig)
 {
-	u32 reg;
-	u32 di_gen, vsync_cnt;
-	u32 div;
-	u32 h_total, v_total;
-	int ret;
-	unsigned long round;
-	struct clk *parent;
+	struct clk *clk;
+	unsigned clkgen0;
+	uint32_t val;
 
-	dev_dbg(di->ipu->dev, "disp %d: panel size = %d x %d\n",
-		di->id, sig->width, sig->height);
+	if (sig->clkflags & IPU_DI_CLKMODE_EXT) {
+		/*
+		 * CLKMODE_EXT means we must use the DI clock: this is
+		 * needed for things like LVDS which needs to feed the
+		 * DI and LDB with the same pixel clock.
+		 */
+		clk = di->clk_di;
+
+		if (sig->clkflags & IPU_DI_CLKMODE_SYNC) {
+			/*
+			 * CLKMODE_SYNC means that we want the DI to be
+			 * clocked at the same rate as the parent clock.
+			 * This is needed (eg) for LDB which needs to be
+			 * fed with the same pixel clock.  We assume that
+			 * the LDB clock has already been set correctly.
+			 */
+			clkgen0 = 1 << 4;
+		} else {
+			/*
+			 * We can use the divider.  We should really have
+			 * a flag here indicating whether the bridge can
+			 * cope with a fractional divider or not.  For the
+			 * time being, let's go for simplicitly and
+			 * reliability.
+			 */
+			unsigned long in_rate;
+			unsigned div;
 
-	if ((sig->v_sync_width == 0) || (sig->h_sync_width == 0))
-		return -EINVAL;
+			clk_set_rate(clk, sig->pixelclock);
 
-	dev_dbg(di->ipu->dev, "Clocks: IPU %luHz DI %luHz Needed %luHz\n",
-		clk_get_rate(di->clk_ipu),
-		clk_get_rate(di->clk_di),
-		sig->pixelclock);
+			in_rate = clk_get_rate(clk);
+			div = (in_rate + sig->pixelclock / 2) / sig->pixelclock;
+			if (div == 0)
+				div = 1;
 
-	/*
-	 * CLKMODE_EXT means we must use the DI clock: this is needed
-	 * for things like LVDS which needs to feed the DI and LDB with
-	 * the same pixel clock.
-	 *
-	 * For other interfaces, we can arbitarily select between the DI
-	 * specific clock and the internal IPU clock.  See DI_GENERAL
-	 * bit 20.  We select the IPU clock if it can give us a clock
-	 * rate within 1% of the requested frequency, otherwise we use
-	 * the DI clock.
-	 */
-	round = sig->pixelclock;
-	if (sig->clkflags & IPU_DI_CLKMODE_EXT) {
-		parent = di->clk_di;
+			clkgen0 = div << 4;
+		}
 	} else {
+		/*
+		 * For other interfaces, we can arbitarily select between
+		 * the DI specific clock and the internal IPU clock.  See
+		 * DI_GENERAL bit 20.  We select the IPU clock if it can
+		 * give us a clock rate within 1% of the requested frequency,
+		 * otherwise we use the DI clock.
+		 */
 		unsigned long rate, clkrate;
 		unsigned div, error;
 
@@ -578,54 +464,80 @@  int ipu_di_init_sync_panel(struct ipu_di *di, struct ipu_di_signal_cfg *sig)
 
 		/* Allow a 1% error */
 		if (error < 1010 && error >= 990) {
-			parent = di->clk_ipu;
+			clk = di->clk_ipu;
+
+			clkgen0 = div << 4;
 		} else {
-			parent = di->clk_di;
+			unsigned long in_rate;
+			unsigned div;
+
+			clk = di->clk_di;
 
-			ret = clk_set_rate(parent, sig->pixelclock);
-			if (ret)
-				dev_err(di->ipu->dev, "Setting of DI clock failed: %d\n", ret);
+			clk_set_rate(clk, sig->pixelclock);
 
-			/* Use the integer divisor rate - avoid fractional dividers */
-			round = rate;
+			in_rate = clk_get_rate(clk);
+			div = (in_rate + sig->pixelclock / 2) / sig->pixelclock;
+			if (div == 0)
+				div = 1;
+
+			clkgen0 = div << 4;
 		}
 	}
 
-	ret = clk_set_parent(di->clk_di_pixel, parent);
-	if (ret) {
-		dev_err(di->ipu->dev,
-			"setting pixel clock to parent %s failed with %d\n",
-				__clk_get_name(parent), ret);
-		return ret;
-	}
+	di->clk_di_pixel = clk;
+
+	/* Set the divider */
+	ipu_di_write(di, clkgen0, DI_BS_CLKGEN0);
 
 	/*
-	 * CLKMODE_SYNC means that we want the DI to be clocked at the
-	 * same rate as the parent clock.  This is needed (eg) for LDB
-	 * which needs to be fed with the same pixel clock.
-	 *
-	 * Note: clk_set_rate(clk, clk_round_rate(clk, rate)) is the
-	 * same as clk_set_rate(clk, rate);
+	 * Set the high/low periods.  Bits 24:16 give us the falling edge,
+	 * and bits 8:0 give the rising edge.  LSB is fraction, and is
+	 * based on the divider above.  We want a 50% duty cycle, so set
+	 * the falling edge to be half the divider.
 	 */
-	if (sig->clkflags & IPU_DI_CLKMODE_SYNC)
-		round = clk_get_rate(parent);
+	ipu_di_write(di, (clkgen0 >> 4) << 16, DI_BS_CLKGEN1);
 
-	ret = clk_set_rate(di->clk_di_pixel, round);
+	/* Finally select the input clock */
+	val = ipu_di_read(di, DI_GENERAL) & ~DI_GEN_DI_CLK_EXT;
+	if (clk == di->clk_di)
+		val |= DI_GEN_DI_CLK_EXT;
+	ipu_di_write(di, val, DI_GENERAL);
 
-	dev_dbg(di->ipu->dev, "Want %luHz IPU %luHz DI %luHz using %s, got %luHz\n",
+	dev_dbg(di->ipu->dev, "Want %luHz IPU %luHz DI %luHz using %s, %luHz\n",
 		sig->pixelclock,
 		clk_get_rate(di->clk_ipu),
 		clk_get_rate(di->clk_di),
-		parent == di->clk_di ? "DI" : "IPU",
-		clk_get_rate(di->clk_di_pixel));
+		clk == di->clk_di ? "DI" : "IPU",
+		clk_get_rate(di->clk_di_pixel) / (clkgen0 >> 4));
+}
+
+int ipu_di_init_sync_panel(struct ipu_di *di, struct ipu_di_signal_cfg *sig)
+{
+	u32 reg;
+	u32 di_gen, vsync_cnt;
+	u32 div;
+	u32 h_total, v_total;
+
+	dev_dbg(di->ipu->dev, "disp %d: panel size = %d x %d\n",
+		di->id, sig->width, sig->height);
+
+	if ((sig->v_sync_width == 0) || (sig->h_sync_width == 0))
+		return -EINVAL;
 
 	h_total = sig->width + sig->h_sync_width + sig->h_start_width +
 		sig->h_end_width;
 	v_total = sig->height + sig->v_sync_width + sig->v_start_width +
 		sig->v_end_width;
 
+	dev_dbg(di->ipu->dev, "Clocks: IPU %luHz DI %luHz Needed %luHz\n",
+		clk_get_rate(di->clk_ipu),
+		clk_get_rate(di->clk_di),
+		sig->pixelclock);
+
 	mutex_lock(&di_mutex);
 
+	ipu_di_config_clock(di, sig);
+
 	div = ipu_di_read(di, DI_BS_CLKGEN0) & 0xfff;
 	div = div / 16;		/* Now divider is integer portion */
 
@@ -709,7 +621,11 @@  EXPORT_SYMBOL_GPL(ipu_di_init_sync_panel);
 
 int ipu_di_enable(struct ipu_di *di)
 {
-	int ret = clk_prepare_enable(di->clk_di_pixel);
+	int ret;
+
+	WARN_ON(IS_ERR(di->clk_di_pixel));
+
+	ret = clk_prepare_enable(di->clk_di_pixel);
 	if (ret)
 		return ret;
 
@@ -721,6 +637,8 @@  EXPORT_SYMBOL_GPL(ipu_di_enable);
 
 int ipu_di_disable(struct ipu_di *di)
 {
+	WARN_ON(IS_ERR(di->clk_di_pixel));
+
 	ipu_module_disable(di->ipu, di->module);
 
 	clk_disable_unprepare(di->clk_di_pixel);
@@ -776,13 +694,6 @@  int ipu_di_init(struct ipu_soc *ipu, struct device *dev, int id,
 		u32 module, struct clk *clk_ipu)
 {
 	struct ipu_di *di;
-	int ret;
-	const char *di_parent[2];
-	struct clk_init_data init = {
-		.ops = &clk_di_ops,
-		.num_parents = 2,
-		.flags = 0,
-	};
 
 	if (id > 1)
 		return -ENODEV;
@@ -804,45 +715,16 @@  int ipu_di_init(struct ipu_soc *ipu, struct device *dev, int id,
 	if (!di->base)
 		return -ENOMEM;
 
-	di_parent[0] = __clk_get_name(di->clk_ipu);
-	di_parent[1] = __clk_get_name(di->clk_di);
-
 	ipu_di_write(di, 0x10, DI_BS_CLKGEN0);
 
-	init.parent_names = (const char **)&di_parent;
-	di->clk_name = kasprintf(GFP_KERNEL, "%s_di%d_pixel",
-			dev_name(dev), id);
-	if (!di->clk_name)
-		return -ENOMEM;
-
-	init.name = di->clk_name;
-
-	di->clk_hw_out.init = &init;
-	di->clk_di_pixel = clk_register(dev, &di->clk_hw_out);
-
-	if (IS_ERR(di->clk_di_pixel)) {
-		ret = PTR_ERR(di->clk_di_pixel);
-		goto failed_clk_register;
-	}
-
 	dev_dbg(dev, "DI%d base: 0x%08lx remapped to %p\n",
 			id, base, di->base);
 	di->inuse = false;
 	di->ipu = ipu;
 
 	return 0;
-
-failed_clk_register:
-
-	kfree(di->clk_name);
-
-	return ret;
 }
 
 void ipu_di_exit(struct ipu_soc *ipu, int id)
 {
-	struct ipu_di *di = ipu->di_priv[id];
-
-	clk_unregister(di->clk_di_pixel);
-	kfree(di->clk_name);
 }