[RFC,v2,1/8] video: tegra: Add nvhost driver

Message ID	1353935954-13763-2-git-send-email-tbergstrom@nvidia.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org> From: Terje Bergstrom <tbergstrom@nvidia.com> To: <thierry.reding@avionic-design.de>, <linux-tegra@vger.kernel.org>, <dri-devel@lists.freedesktop.org> Subject: [RFC v2 1/8] video: tegra: Add nvhost driver Date: Mon, 26 Nov 2012 15:19:07 +0200 Message-ID: <1353935954-13763-2-git-send-email-tbergstrom@nvidia.com> In-Reply-To: <1353935954-13763-1-git-send-email-tbergstrom@nvidia.com> References: <1353935954-13763-1-git-send-email-tbergstrom@nvidia.com> MIME-Version: 1.0 Cc: linux-kernel@vger.kernel.org Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org Errors-To: dri-devel-bounces+patchwork-dri-devel=patchwork.kernel.org@lists.freedesktop.org

Terje Bergstrom Nov. 26, 2012, 1:19 p.m. UTC

Add nvhost, the driver for host1x. This patch adds support for reading and
incrementing sync points and dynamic power management.

Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
---
 drivers/video/Kconfig                              |    2 +
 drivers/video/Makefile                             |    2 +
 drivers/video/tegra/host/Kconfig                   |    5 +
 drivers/video/tegra/host/Makefile                  |   10 +
 drivers/video/tegra/host/chip_support.c            |   48 ++
 drivers/video/tegra/host/chip_support.h            |   52 +++
 drivers/video/tegra/host/dev.c                     |   96 ++++
 drivers/video/tegra/host/host1x/Makefile           |    7 +
 drivers/video/tegra/host/host1x/host1x.c           |  204 +++++++++
 drivers/video/tegra/host/host1x/host1x.h           |   78 ++++
 drivers/video/tegra/host/host1x/host1x01.c         |   37 ++
 drivers/video/tegra/host/host1x/host1x01.h         |   29 ++
 .../video/tegra/host/host1x/host1x01_hardware.h    |   36 ++
 drivers/video/tegra/host/host1x/host1x_syncpt.c    |  156 +++++++
 drivers/video/tegra/host/host1x/hw_host1x01_sync.h |  398 ++++++++++++++++
 drivers/video/tegra/host/nvhost_acm.c              |  481 ++++++++++++++++++++
 drivers/video/tegra/host/nvhost_acm.h              |   45 ++
 drivers/video/tegra/host/nvhost_syncpt.c           |  333 ++++++++++++++
 drivers/video/tegra/host/nvhost_syncpt.h           |  136 ++++++
 include/linux/nvhost.h                             |  143 ++++++
 20 files changed, 2298 insertions(+)
 create mode 100644 drivers/video/tegra/host/Kconfig
 create mode 100644 drivers/video/tegra/host/Makefile
 create mode 100644 drivers/video/tegra/host/chip_support.c
 create mode 100644 drivers/video/tegra/host/chip_support.h
 create mode 100644 drivers/video/tegra/host/dev.c
 create mode 100644 drivers/video/tegra/host/host1x/Makefile
 create mode 100644 drivers/video/tegra/host/host1x/host1x.c
 create mode 100644 drivers/video/tegra/host/host1x/host1x.h
 create mode 100644 drivers/video/tegra/host/host1x/host1x01.c
 create mode 100644 drivers/video/tegra/host/host1x/host1x01.h
 create mode 100644 drivers/video/tegra/host/host1x/host1x01_hardware.h
 create mode 100644 drivers/video/tegra/host/host1x/host1x_syncpt.c
 create mode 100644 drivers/video/tegra/host/host1x/hw_host1x01_sync.h
 create mode 100644 drivers/video/tegra/host/nvhost_acm.c
 create mode 100644 drivers/video/tegra/host/nvhost_acm.h
 create mode 100644 drivers/video/tegra/host/nvhost_syncpt.c
 create mode 100644 drivers/video/tegra/host/nvhost_syncpt.h
 create mode 100644 include/linux/nvhost.h

Sivaram Nair Nov. 27, 2012, 10:52 a.m. UTC | #1

On Mon, Nov 26, 2012 at 02:19:07PM +0100, Terje Bergstrom wrote:
> +
> +struct nvhost_chip_support *nvhost_chip_ops;

should be static?

> +static int __devinit nvhost_alloc_resources(struct nvhost_master *host)
> +{
> +       int err;
> +
> +       err = nvhost_init_chip_support(host);
> +       if (err)
> +               return err;
> +
> +       return 0;

nit: why not just return err - the 'if(err)' is unnecessary)?

> +
> +       nvhost = host;

I think this should be delayed until the init is complete as this
variable is not cleared if there is a failure during init. Also I feel
that the name nvhost is a bit short for an exported variable.

> +static void to_state_running_locked(struct platform_device *dev)
> +{
> +       struct nvhost_device_data *pdata = platform_get_drvdata(dev);
> +       int prev_state = pdata->powerstate;
> +
> +       if (pdata->powerstate == NVHOST_POWER_STATE_POWERGATED)
> +               to_state_clockgated_locked(dev);
> +
> +       if (pdata->powerstate == NVHOST_POWER_STATE_CLOCKGATED) {
> +               int i;
> +
> +               if (dev->dev.parent)
> +                       nvhost_module_busy(to_platform_device(dev->dev.parent));
> +
> +               for (i = 0; i < pdata->num_clks; i++) {
> +                       int err = clk_prepare_enable(pdata->clk[i]);
> +                       if (err) {
> +                               dev_err(&dev->dev, "Cannot turn on clock %s",
> +                                       pdata->clocks[i].name);
> +                               return;

In case of an error, returning here leaves some clocks turned on. 

Sivaram

Thierry Reding Nov. 28, 2012, 9:23 p.m. UTC | #2

On Mon, Nov 26, 2012 at 03:19:07PM +0200, Terje Bergstrom wrote:
[...]
> diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
> index fb9a14e..94c861b 100644
> --- a/drivers/video/Kconfig
> +++ b/drivers/video/Kconfig
> @@ -2463,4 +2463,6 @@ config FB_SH_MOBILE_MERAM
>  	  Up to 4 memory channels can be configured, allowing 4 RGB or
>  	  2 YCbCr framebuffers to be configured.
>  
> +source "drivers/video/tegra/host/Kconfig"
> +

This could be problematic. Since drivers/video and drivers/gpu/drm are
separate trees, this would entail a continuous burden on keeping both
trees synchronized. While I realize that eventually it might be better
to put the host1x driver in a separate place to accomodate for its use
by other subsystems, I'm not sure moving it here right away is the best
approach.

I'm not sure drivers/video is the best location either. Perhaps
drivers/bus would be better? Or maybe we need a new subdirectory for
this kind of device.

> diff --git a/drivers/video/tegra/host/chip_support.c b/drivers/video/tegra/host/chip_support.c
> new file mode 100644
> index 0000000..5a44147
> --- /dev/null
> +++ b/drivers/video/tegra/host/chip_support.c
> @@ -0,0 +1,48 @@
> +/*
> + * drivers/video/tegra/host/chip_support.c

I think the general nowadays is to no longer use filenames in comments.

[...]
> +struct nvhost_chip_support *nvhost_chip_ops;
> +
> +struct nvhost_chip_support *nvhost_get_chip_ops(void)
> +{
> +	return nvhost_chip_ops;
> +}

This seems like it should be more tightly coupled to the host1x device.
And it shouldn't be a global variable.

> +
> +int nvhost_init_chip_support(struct nvhost_master *host)
> +{
> +	if (nvhost_chip_ops == NULL) {
> +		nvhost_chip_ops = kzalloc(sizeof(*nvhost_chip_ops), GFP_KERNEL);
> +		if (nvhost_chip_ops == NULL) {
> +			pr_err("%s: Cannot allocate nvhost_chip_support\n",
> +				__func__);
> +			return -ENOMEM;
> +		}
> +	}
> +
> +	nvhost_init_host1x01_support(host, nvhost_chip_ops);
> +	return 0;
> +}

We also don't need this. This should really be done by the central
host1x device's initialization.

> diff --git a/drivers/video/tegra/host/chip_support.h b/drivers/video/tegra/host/chip_support.h
[...]
> +struct output;

What's this? It doesn't seem to be used anywhere.

> +struct nvhost_master;

Why do you suffix this with _master? The whole point of host1x is to be
the "master" so you can just as well call it nvhost, right? Ideally
you'd call it host1x, but I'm repeating myself. =)

> +struct nvhost_syncpt;
> +struct platform_device;
> +
> +struct nvhost_syncpt_ops {
> +	void (*reset)(struct nvhost_syncpt *, u32 id);
> +	void (*reset_wait_base)(struct nvhost_syncpt *, u32 id);
> +	void (*read_wait_base)(struct nvhost_syncpt *, u32 id);
> +	u32 (*update_min)(struct nvhost_syncpt *, u32 id);
> +	void (*cpu_incr)(struct nvhost_syncpt *, u32 id);
> +	void (*debug)(struct nvhost_syncpt *);
> +	const char * (*name)(struct nvhost_syncpt *, u32 id);
> +};

Why are these even defined as ops structure? Tegra20 and Tegra30 seem to
be compatible when it comes to handling syncpoints. I thought they would
even be compatible in all other aspects as well, so why even have this?

> +
> +struct nvhost_chip_support {
> +	const char *soc_name;
> +	struct nvhost_syncpt_ops syncpt;
> +};
> +
> +struct nvhost_chip_support *nvhost_get_chip_ops(void);
> +
> +#define syncpt_op()		(nvhost_get_chip_ops()->syncpt)

You really shouldn't be doing this, but rather use explicit accesses for
these structures. If you're design doesn't scatter these definitions
across several files then it isn't difficult to obtain the correct
pointers and you don't need these "shortcuts".

> diff --git a/drivers/video/tegra/host/dev.c b/drivers/video/tegra/host/dev.c
[...]
> +u32 host1x_syncpt_incr_max(u32 id, u32 incrs)
> +{
> +	struct nvhost_syncpt *sp = &nvhost->syncpt;
> +	return nvhost_syncpt_incr_max(sp, id, incrs);
> +}
> +EXPORT_SYMBOL(host1x_syncpt_incr_max);

This API looks odd. Should syncpoints not be considered as regular
resources, much like interrupts? In that case it would be easier to
abstract them away behind an opaque type. It looks like you already use
the struct nvhost_syncpt to refer to the set of syncpoints associated
with a host1x device.

How about you use nvhost/host1x_syncpt to refer to individual syncpoints
instead. You could export an array of those from your host1x device and
implement a basic resource allocation mechanism on top, similar to how
other resources are handled in the kernel.

So a host1x client device could call host1x_request_syncpt() to allocate
a syncpoint from it's host1x parent dynamically along with passing a
name and a syncpoint handler to it.

> +
> +void host1x_syncpt_incr(u32 id)
> +{
> +	struct nvhost_syncpt *sp = &nvhost->syncpt;
> +	nvhost_syncpt_incr(sp, id);
> +}
> +EXPORT_SYMBOL(host1x_syncpt_incr);

Similarly, instead of passing an integer here, host1x clients would pass
a pointer to the requested syncpoint instead.

> +bool host1x_powered(struct platform_device *dev)
> +{
[...]
> +}
> +EXPORT_SYMBOL(host1x_powered);
> +
> +void host1x_busy(struct platform_device *dev)
> +{
[...]
> +}
> +EXPORT_SYMBOL(host1x_busy);
> +
> +void host1x_idle(struct platform_device *dev)
> +{
[...]
> +}
> +EXPORT_SYMBOL(host1x_idle);

These look like a reimplementation of the runtime power-management
framework.

> diff --git a/drivers/video/tegra/host/host1x/host1x.c b/drivers/video/tegra/host/host1x/host1x.c
[...]
> +struct nvhost_master *nvhost;

Bad habbit. I know that this is a popular shortcut. However this also
leads to very bad designs because you're allowed to reuse this pointer
from wherever you like.

When I wrote the tegra-drm code I explicitly made sure to not use any
such global variable. In the end it forces you to clean up the driver
design.

As a bonus you automatically get support for any number of host1x
devices on the same SoC. Now you will probably tell me that this is
never going to happen. People also used to think that a computers would
never use more than a single CPU...

> +static void power_on_host(struct platform_device *dev)
> +{
> +	struct nvhost_master *host = nvhost_get_private_data(dev);
> +
> +	nvhost_syncpt_reset(&host->syncpt);
> +}
> +
> +static int power_off_host(struct platform_device *dev)
> +{
> +	struct nvhost_master *host = nvhost_get_private_data(dev);
> +
> +	nvhost_syncpt_save(&host->syncpt);
> +	return 0;
> +}

These seem like possible candidates for runtime PM.

> +
> +static void nvhost_free_resources(struct nvhost_master *host)
> +{
> +}

This should be removed since it's empty.

> +
> +static int __devinit nvhost_alloc_resources(struct nvhost_master *host)
> +{
> +	int err;
> +
> +	err = nvhost_init_chip_support(host);
> +	if (err)
> +		return err;
> +
> +	return 0;
> +}

Again, this chip support concept is not useful, so this function can go
away as well. Also nvhost_init_chip_support() doesn't allocate any
resources so it shouldn't be called from this function in the first
place.

> +
> +static int __devinit nvhost_probe(struct platform_device *dev)
> +{
> +	struct nvhost_master *host;
> +	struct resource *regs, *intr0, *intr1;
> +	int i, err;
> +	struct nvhost_device_data *pdata =
> +		(struct nvhost_device_data *)dev->dev.platform_data;

Platform data should not be used. Tegra is DT only.

> +	regs = platform_get_resource(dev, IORESOURCE_MEM, 0);
> +	intr0 = platform_get_resource(dev, IORESOURCE_IRQ, 0);
> +	intr1 = platform_get_resource(dev, IORESOURCE_IRQ, 1);
> +
> +	if (!regs || !intr0 || !intr1) {

I prefer to have these checked for explicitly, one by one for
readability and potentially more useful diagnostics.

Also you should be using platform_get_irq() for interrupts. Furthermore
the host1x DT node (and the TRM) name the interrupts "syncpt" and
"general", so maybe those would be more useful variable names than
"intr0" and "intr1".

But since you don't use them anyway they shouldn't be part of this
patch.

> +	host = devm_kzalloc(&dev->dev, sizeof(*host), GFP_KERNEL);
> +	if (!host)
> +		return -ENOMEM;
> +
> +	nvhost = host;
> +
> +	host->dev = dev;
> +
> +	/* Copy host1x parameters. The private_data gets replaced
> +	 * by nvhost_master later */

Multiline comments should be in this format:

	/*
	 * foo
	 */

> +	memcpy(&host->info, pdata->private_data,
> +			sizeof(struct host1x_device_info));

I don't think passing data in this way shouldn't be necessary as
discussed in the subthread on the Tegra AUXDATA.

> +
> +	pdata->finalize_poweron = power_on_host;
> +	pdata->prepare_poweroff = power_off_host;
> +
> +	pdata->pdev = dev;
> +
> +	/* set common host1x device data */
> +	platform_set_drvdata(dev, pdata);
> +
> +	/* set private host1x device data */
> +	nvhost_set_private_data(dev, host);
> +
> +	host->aperture = devm_request_and_ioremap(&dev->dev, regs);
> +	if (!host->aperture) {

aperture is confusing as it is typically used for GTT-type memory
regions, so it may be mistaken for the GART found on Tegra 2. Why not
call it "regs" instead?

> +		dev_err(&dev->dev, "failed to remap host registers\n");

This is unnecessary. devm_request_and_ioremap() already prints an error
message on failure.

> +	for (i = 0; i < pdata->num_clks; i++)
> +		clk_prepare_enable(pdata->clk[i]);
> +	nvhost_syncpt_reset(&host->syncpt);
> +	for (i = 0; i < pdata->num_clks; i++)
> +		clk_disable_unprepare(pdata->clk[i]);

Stephen already hinted at this when discussing the AUXDATA. You should
explicitly request the clocks.

> +static int __exit nvhost_remove(struct platform_device *dev)

This should really be __devexit to allow the driver to be built as a
module. However, __dev* are deprecated and in the process of being
removed so you can just drop __exit as well.

> +static struct of_device_id host1x_match[] __devinitdata = {

__devinitdata can be dropped.

> +	{ .compatible = "nvidia,tegra20-host1x", },
> +	{ .compatible = "nvidia,tegra30-host1x", },
> +	{ },
> +};
> +
> +static struct platform_driver platform_driver = {
> +	.probe = nvhost_probe,
> +	.remove = __exit_p(nvhost_remove),

__exit_p also.

> +	.suspend = nvhost_suspend,
> +	.resume = nvhost_resume,
> +	.driver = {
> +		.owner = THIS_MODULE,
> +		.name = DRIVER_NAME,
> +		.of_match_table = of_match_ptr(host1x_match),

No need for of_match_ptr().

> +static int __init nvhost_mod_init(void)
> +{
> +	return platform_driver_register(&platform_driver);
> +}
> +
> +static void __exit nvhost_mod_exit(void)
> +{
> +	platform_driver_unregister(&platform_driver);
> +}
> +
> +module_init(nvhost_mod_init);
> +module_exit(nvhost_mod_exit);

Use module_platform_driver().

> diff --git a/drivers/video/tegra/host/host1x/host1x.h b/drivers/video/tegra/host/host1x/host1x.h
[...]
> +#define TRACE_MAX_LENGTH	128U
> +#define IFACE_NAME		"nvhost"

None of these seem to be used.

> +static inline void *nvhost_get_private_data(struct platform_device *_dev)
> +{
> +	struct nvhost_device_data *pdata =
> +		(struct nvhost_device_data *)platform_get_drvdata(_dev);
> +	WARN_ON(!pdata);
> +	return (pdata && pdata->private_data) ? pdata->private_data : NULL;
> +}
> +
> +static inline void nvhost_set_private_data(struct platform_device *_dev,
> +	void *priv_data)
> +{
> +	struct nvhost_device_data *pdata =
> +		(struct nvhost_device_data *)platform_get_drvdata(_dev);
> +	WARN_ON(!pdata);
> +	if (pdata)
> +		pdata->private_data = priv_data;
> +}

You should need none of these. Instead put all the data you need into
you struct host1x and associate that with the platform device using
platform_set_drvdata().

> +static inline
> +struct nvhost_master *nvhost_get_host(struct platform_device *_dev)
> +{
> +	struct platform_device *pdev;
> +
> +	if (_dev->dev.parent) {
> +		pdev = to_platform_device(_dev->dev.parent);
> +		return nvhost_get_private_data(pdev);
> +	} else
> +		return nvhost_get_private_data(_dev);
> +}
> +
> +static inline
> +struct platform_device *nvhost_get_parent(struct platform_device *_dev)
> +{
> +	return _dev->dev.parent ? to_platform_device(_dev->dev.parent) : NULL;
> +}

These don't seem to be used.

> diff --git a/drivers/video/tegra/host/host1x/host1x01.c b/drivers/video/tegra/host/host1x/host1x01.c
> new file mode 100644
> index 0000000..d53302d
> --- /dev/null
> +++ b/drivers/video/tegra/host/host1x/host1x01.c
> @@ -0,0 +1,37 @@
> +/*
> + * drivers/video/tegra/host/host1x01.c
> + *
> + * Host1x init for T20 and T30 Architecture Chips
> + *
> + * Copyright (c) 2011-2012, NVIDIA Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/nvhost.h>
> +
> +#include "host1x/host1x01.h"
> +#include "host1x/host1x.h"
> +#include "host1x/host1x01_hardware.h"
> +#include "chip_support.h"
> +
> +#include "host1x/host1x_syncpt.c"
> +
> +int nvhost_init_host1x01_support(struct nvhost_master *host,
> +	struct nvhost_chip_support *op)
> +{
> +	host->sync_aperture = host->aperture + HOST1X_CHANNEL_SYNC_REG_BASE;

Usually you don't keep separate variables for subregions. This can
equally well be done with just adding a corresponding offset.

Then again, I already said that this whole chip support concept is
unnecessary and can be dropped.

> diff --git a/drivers/video/tegra/host/host1x/host1x_syncpt.c b/drivers/video/tegra/host/host1x/host1x_syncpt.c
[...]
> +/**
> + * Write the current syncpoint value back to hw.
> + */
> +static void host1x_syncpt_reset(struct nvhost_syncpt *sp, u32 id)
> +{
> +	struct nvhost_master *dev = syncpt_to_dev(sp);
> +	int min = nvhost_syncpt_read_min(sp, id);
> +	writel(min, dev->sync_aperture + (host1x_sync_syncpt_0_r() + id * 4));
> +}

Again, better to represent individual syncpoints with opaque pointers
and dereference them here. Obviously this file will need access to the
structure definition.

> +static void host1x_syncpt_debug(struct nvhost_syncpt *sp)
> +{
> +	u32 i;
> +	for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) {
> +		u32 max = nvhost_syncpt_read_max(sp, i);
> +		u32 min = nvhost_syncpt_update_min(sp, i);
> +		if (!max && !min)
> +			continue;
> +		dev_info(&syncpt_to_dev(sp)->dev->dev,
> +			"id %d (%s) min %d max %d\n",
> +			i, syncpt_op().name(sp, i),
> +			min, max);
> +
> +	}
> +
> +	for (i = 0; i < nvhost_syncpt_nb_bases(sp); i++) {
> +		u32 base_val;
> +		host1x_syncpt_read_wait_base(sp, i);
> +		base_val = sp->base_val[i];
> +		if (base_val)
> +			dev_info(&syncpt_to_dev(sp)->dev->dev,
> +					"waitbase id %d val %d\n",
> +					i, base_val);
> +
> +	}
> +}

This should probably be integrated with debugfs.

> diff --git a/drivers/video/tegra/host/host1x/hw_host1x01_sync.h b/drivers/video/tegra/host/host1x/hw_host1x01_sync.h

Autogenerated files are generally not acceptable. And I already
mentioned before that you should be using #define instead of static
inline functions for register and bit definitions.

> diff --git a/drivers/video/tegra/host/nvhost_acm.c b/drivers/video/tegra/host/nvhost_acm.c
[...]

This whole file largely looks like a reimplementation of runtime PM. You
should investigate if you can't reuse the existing infrastructure.

> +	/* Init the power sysfs attributes for this device */
> +	pdata->power_attrib = devm_kzalloc(&dev->dev,
> +			sizeof(struct nvhost_device_power_attr),
> +		GFP_KERNEL);
> +	if (!pdata->power_attrib) {
> +		dev_err(&dev->dev, "Unable to allocate sysfs attributes\n");
> +		return -ENOMEM;
> +	}
> +	pdata->power_attrib->ndev = dev;
> +
> +	pdata->power_kobj = kobject_create_and_add("acm", &dev->dev.kobj);
> +	if (!pdata->power_kobj) {
> +		dev_err(&dev->dev, "Could not add dir 'power'\n");
> +		err = -EIO;
> +		goto fail_attrib_alloc;
> +	}
> +
> +	attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_CLOCKGATE_DELAY];
> +	attr->attr.name = "clockgate_delay";
> +	attr->attr.mode = S_IWUSR | S_IRUGO;
> +	attr->show = clockgate_delay_show;
> +	attr->store = clockgate_delay_store;
> +	if (sysfs_create_file(pdata->power_kobj, &attr->attr)) {
> +		dev_err(&dev->dev, "Could not create sysfs attribute clockgate_delay\n");
> +		err = -EIO;
> +		goto fail_clockdelay;
> +	}
> +
> +	attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY];
> +	attr->attr.name = "powergate_delay";
> +	attr->attr.mode = S_IWUSR | S_IRUGO;
> +	attr->show = powergate_delay_show;
> +	attr->store = powergate_delay_store;
> +	if (sysfs_create_file(pdata->power_kobj, &attr->attr)) {
> +		dev_err(&dev->dev, "Could not create sysfs attribute powergate_delay\n");
> +		err = -EIO;
> +		goto fail_powergatedelay;
> +	}
> +
> +	attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_REFCOUNT];
> +	attr->attr.name = "refcount";
> +	attr->attr.mode = S_IRUGO;
> +	attr->show = refcount_show;
> +	if (sysfs_create_file(pdata->power_kobj, &attr->attr)) {
> +		dev_err(&dev->dev, "Could not create sysfs attribute refcount\n");
> +		err = -EIO;
> +		goto fail_refcount;
> +	}

This is a very funky way of creating sysfs attributes. What you probably
want here are device attributes. See Documentation/filesystems/sysfs.txt
and include/linux/sysfs.h.

But if you can replace this by runtime PM, you'll get similar
functionality for free anyway.

> diff --git a/drivers/video/tegra/host/nvhost_syncpt.c b/drivers/video/tegra/host/nvhost_syncpt.c
[...]
> +/**
> + * Returns true if syncpoint is expired, false if we may need to wait
> + */
> +bool nvhost_syncpt_is_expired(
> +	struct nvhost_syncpt *sp,
> +	u32 id,
> +	u32 thresh)
> +{
> +	u32 current_val;
> +	u32 future_val;
> +	smp_rmb();

What do you need a read memory barrier for?

> +/* Displays the current value of the sync point via sysfs */
> +static ssize_t syncpt_min_show(struct kobject *kobj,
> +		struct kobj_attribute *attr, char *buf)
> +{
> +	struct nvhost_syncpt_attr *syncpt_attr =
> +		container_of(attr, struct nvhost_syncpt_attr, attr);
> +
> +	return snprintf(buf, PAGE_SIZE, "%u",
> +			nvhost_syncpt_read(&syncpt_attr->host->syncpt,
> +				syncpt_attr->id));
> +}
> +
> +static ssize_t syncpt_max_show(struct kobject *kobj,
> +		struct kobj_attribute *attr, char *buf)
> +{
> +	struct nvhost_syncpt_attr *syncpt_attr =
> +		container_of(attr, struct nvhost_syncpt_attr, attr);
> +
> +	return snprintf(buf, PAGE_SIZE, "%u",
> +			nvhost_syncpt_read_max(&syncpt_attr->host->syncpt,
> +				syncpt_attr->id));
> +}

This looks like it belongs in debugfs.

> +int nvhost_syncpt_init(struct platform_device *dev,
> +		struct nvhost_syncpt *sp)
> +{
> +	int i;
> +	struct nvhost_master *host = syncpt_to_dev(sp);
> +	int err = 0;
> +
> +	/* Allocate structs for min, max and base values */
> +	sp->min_val = kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_pts(sp),
> +			GFP_KERNEL);
> +	sp->max_val = kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_pts(sp),
> +			GFP_KERNEL);
> +	sp->base_val = kzalloc(sizeof(u32) * nvhost_syncpt_nb_bases(sp),
> +			GFP_KERNEL);
> +	sp->lock_counts =
> +		kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_mlocks(sp),
> +			GFP_KERNEL);

Again, I really think that syncpoints should be objects with
corresponding attributes instead of keeping them in these arrays.

> diff --git a/include/linux/nvhost.h b/include/linux/nvhost.h
[...]
> +struct host1x_device_info {
> +	int		nb_channels;	/* host1x: num channels supported */
> +	int		nb_pts;		/* host1x: num syncpoints supported */
> +	int		nb_bases;	/* host1x: num syncpoints supported */
> +	u32		client_managed; /* host1x: client managed syncpts */
> +	int		nb_mlocks;	/* host1x: number of mlocks */
> +	const char	**syncpt_names;	/* names of sync points */
> +};
> +
> +struct nvhost_device_data {
> +	int		version;	/* ip version number of device */
> +	int		id;		/* Separates clients of same hw */
> +	int		index;		/* Hardware channel number */
> +	void __iomem	*aperture;	/* Iomem mapped to kernel */
> +
> +	u32		syncpts;	/* Bitfield of sync points used */
> +	u32		modulemutexes;	/* Bit field of module mutexes */
> +
> +	u32		class;		/* Device class */
> +	bool		serialize;	/* Serialize submits in the channel */
> +
> +	int		powergate_ids[NVHOST_MODULE_MAX_POWERGATE_IDS];
> +	bool		can_powergate;	/* True if module can be power gated */
> +	int		clockgate_delay;/* Delay before clock gated */
> +	int		powergate_delay;/* Delay before power gated */
> +	struct nvhost_clock clocks[NVHOST_MODULE_MAX_CLOCKS];/* Clock names */
> +
> +	struct delayed_work powerstate_down;/* Power state management */
> +	int		num_clks;	/* Number of clocks opened for dev */
> +	struct clk	*clk[NVHOST_MODULE_MAX_CLOCKS];
> +	struct mutex	lock;		/* Power management lock */
> +	int		powerstate;	/* Current power state */
> +	int		refcount;	/* Number of tasks active */
> +	wait_queue_head_t idle_wq;	/* Work queue for idle */
> +
> +	struct nvhost_channel *channel;	/* Channel assigned for the module */
> +	struct kobject *power_kobj;	/* kobj to hold power sysfs entries */
> +	struct nvhost_device_power_attr *power_attrib;	/* sysfs attributes */
> +	struct dentry *debugfs;		/* debugfs directory */
> +
> +	void *private_data;		/* private platform data */
> +	struct platform_device *pdev;	/* owner platform_device */
> +
> +	/* Finalize power on. Can be used for context restore. */
> +	void (*finalize_poweron)(struct platform_device *dev);
> +
> +	/* Device is busy. */
> +	void (*busy)(struct platform_device *);
> +
> +	/* Device is idle. */
> +	void (*idle)(struct platform_device *);
> +
> +	/* Device is going to be suspended */
> +	void (*suspend_ndev)(struct platform_device *);
> +
> +	/* Device is initialized */
> +	void (*init)(struct platform_device *dev);
> +
> +	/* Device is de-initialized. */
> +	void (*deinit)(struct platform_device *dev);
> +
> +	/* Preparing for power off. Used for context save. */
> +	int (*prepare_poweroff)(struct platform_device *dev);
> +
> +	/* Clock gating callbacks */
> +	int (*prepare_clockoff)(struct platform_device *dev);
> +	void (*finalize_clockon)(struct platform_device *dev);
> +};

A lot of this can be removed if you use existing infrastructure and
simplify the design a bit. Most of it can probably move into the main
struct host1x to avoid needless indirections that make the code hard to
follow and maintain.

Thierry

Mark Zhang Nov. 29, 2012, 9:10 a.m. UTC | #3

On 11/26/2012 09:19 PM, Terje Bergström <tbergstrom@nvidia.com> wrote:
> Add nvhost, the driver for host1x. This patch adds support for reading and
> incrementing sync points and dynamic power management.
> 
> Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
> 
> ---
> drivers/video/Kconfig                              |    2 +
>  drivers/video/Makefile                             |    2 +
>  drivers/video/tegra/host/Kconfig                   |    5 +
>  drivers/video/tegra/host/Makefile                  |   10 +
>  drivers/video/tegra/host/chip_support.c            |   48 ++
>  drivers/video/tegra/host/chip_support.h            |   52 +++
>  drivers/video/tegra/host/dev.c                     |   96 ++++
>  drivers/video/tegra/host/host1x/Makefile           |    7 +
>  drivers/video/tegra/host/host1x/host1x.c           |  204 +++++++++
>  drivers/video/tegra/host/host1x/host1x.h           |   78 ++++
>  drivers/video/tegra/host/host1x/host1x01.c         |   37 ++
>  drivers/video/tegra/host/host1x/host1x01.h         |   29 ++
>  .../video/tegra/host/host1x/host1x01_hardware.h    |   36 ++
>  drivers/video/tegra/host/host1x/host1x_syncpt.c    |  156 +++++++
>  drivers/video/tegra/host/host1x/hw_host1x01_sync.h |  398 ++++++++++++++++
>  drivers/video/tegra/host/nvhost_acm.c              |  481 ++++++++++++++++++++
>  drivers/video/tegra/host/nvhost_acm.h              |   45 ++
>  drivers/video/tegra/host/nvhost_syncpt.c           |  333 ++++++++++++++
>  drivers/video/tegra/host/nvhost_syncpt.h           |  136 ++++++
>  include/linux/nvhost.h                             |  143 ++++++
>  20 files changed, 2298 insertions(+)
[...]
> diff --git a/drivers/video/tegra/host/chip_support.c b/drivers/video/tegra/host/chip_support.c
> +#include "chip_support.h"
> +#include "host1x/host1x01.h"
> +
> +struct nvhost_chip_support *nvhost_chip_ops;
> +
> +struct nvhost_chip_support *nvhost_get_chip_ops(void)
> +{
> +	return nvhost_chip_ops;
> +}

If you wanna hide "nvhost_chip_ops" from other source files, declare it
as "static". So this is not a static member which means other files is
able to touch it by "extern" but we also define a function to get it,
and this looks redundant.

[...]
> diff --git a/drivers/video/tegra/host/host1x/Makefile b/drivers/video/tegra/host/host1x/Makefile
> new file mode 100644
> index 0000000..330d507
> --- /dev/null
> +++ b/drivers/video/tegra/host/host1x/Makefile
> @@ -0,0 +1,7 @@
> +ccflags-y = -Idrivers/video/tegra/host
> +
> +nvhost-host1x-objs  = \
> +	host1x.o \
> +	host1x01.o

Can we rename this "host1x01.c"? I just really don't like this kind of
variables/files, I mean, I can't imagine the purpose of the file
according to it's name...

[...]
> +
> +static int __devinit nvhost_alloc_resources(struct nvhost_master *host)
> +{
> +	int err;
> +
> +	err = nvhost_init_chip_support(host);
> +	if (err)
> +		return err;
> +
> +	return 0;
> +}

Just "return nvhost_init_chip_support(host)" is enough. If so, do we
still need this function?

[...]
> +
> +static int __devinit nvhost_probe(struct platform_device *dev)
> +
[...]
> +	dev_info(&dev->dev, "initialized\n");
> +
> +	return 0;
> +
> +fail:

Add more "free" codes here. Actually, "nvhost_free_resources" frees the
host->intr.syncpt which is not needed to free manually.
Seems at least we need to add "nvhost_syncpt_deinit" here.

[...]
> +
> +static struct of_device_id host1x_match[] __devinitdata = {
> +	{ .compatible = "nvidia,tegra20-host1x", },
> +	{ .compatible = "nvidia,tegra30-host1x", },

Again, place tegra30-host1x before tegra20-host1x.

[...]
> +
> +/**
> + * Write a cpu syncpoint increment to the hardware, without touching
> + * the cache. Caller is responsible for host being powered.
> + */
> +static void host1x_syncpt_cpu_incr(struct nvhost_syncpt *sp, u32 id)
> +{
> +	struct nvhost_master *dev = syncpt_to_dev(sp);
> +	u32 reg_offset = id / 32;
> +
> +	if (!nvhost_module_powered(dev->dev)) {
> +		dev_err(&syncpt_to_dev(sp)->dev->dev,
> +			"Trying to access host1x when it's off");
> +		return;
> +	}
> +
> +	if (!nvhost_syncpt_client_managed(sp, id)
> +			&& nvhost_syncpt_min_eq_max(sp, id)) {
> +		dev_err(&syncpt_to_dev(sp)->dev->dev,
> +			"Trying to increment syncpoint id %d beyond max\n",
> +			id);
> +		return;
> +	}
> +	writel(BIT_MASK(id), dev->sync_aperture +
> +			host1x_sync_syncpt_cpu_incr_r() + reg_offset * 4);

I have a stupid question: According to the name and the context of this
function, seems it increases the syncpt value which specified by param
"id". So how does this "writel" increase the value? I don't know much
about host1x/syncpt reg operations, so could you explain a little bit or
I just completely have a wrong understanding?

[...]
> +
> +static ssize_t powergate_delay_store(struct kobject *kobj,
> +	struct kobj_attribute *attr, const char *buf, size_t count)
> +{
> +	int powergate_delay = 0, ret = 0;
> +	struct nvhost_device_power_attr *power_attribute =
> +		container_of(attr, struct nvhost_device_power_attr,
> +			power_attr[NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY]);
> +	struct platform_device *dev = power_attribute->ndev;
> +	struct nvhost_device_data *pdata = platform_get_drvdata(dev);
> +
> +	if (!pdata->can_powergate) {
> +		dev_info(&dev->dev, "does not support power-gating\n");
> +		return count;
> +	}
> +
> +	mutex_lock(&pdata->lock);
> +	ret = sscanf(buf, "%d", &powergate_delay);
> +	if (ret == 1 && powergate_delay >= 0)
> +		pdata->powergate_delay = powergate_delay;
> +	else
> +		dev_err(&dev->dev, "Invalid powergate delay\n");
> +	mutex_unlock(&pdata->lock);
> +
> +	return count;

Why we need to return an unchanged param? Seems param "count" doesn't
make sense here.

[...]
> +
> +int nvhost_module_init(struct platform_device *dev)
> +{
> +	int i = 0, err = 0;
> +	struct kobj_attribute *attr = NULL;
> +	struct nvhost_device_data *pdata = platform_get_drvdata(dev);
> +
> +	/* initialize clocks to known state */
> +	while (pdata->clocks[i].name && i < NVHOST_MODULE_MAX_CLOCKS) {
> +		long rate = pdata->clocks[i].default_rate;
> +		struct clk *c;
> +
> +		c = devm_clk_get(&dev->dev, pdata->clocks[i].name);
> +		if (IS_ERR_OR_NULL(c)) {
> +			dev_err(&dev->dev, "Cannot get clock %s\n",
> +					pdata->clocks[i].name);
> +			return -ENODEV;
> +		}
> +
> +		rate = clk_round_rate(c, rate);
> +		clk_prepare_enable(c);
> +		clk_set_rate(c, rate);
> +		clk_disable_unprepare(c);
> +		pdata->clk[i] = c;
> +		i++;
> +	}
> +	pdata->num_clks = i;
> +
> +	mutex_init(&pdata->lock);
> +	init_waitqueue_head(&pdata->idle_wq);
> +	INIT_DELAYED_WORK(&pdata->powerstate_down, powerstate_down_handler);
> +
> +	/* power gate units that we can power gate */
> +	if (pdata->can_powergate) {
> +		do_powergate_locked(pdata->powergate_ids[0]);
> +		do_powergate_locked(pdata->powergate_ids[1]);

Seems we don't set these 2 powergate_ids. Does this mean we have not
enabled power management feature in this version?

[...]
> +
> +int nvhost_module_suspend(struct platform_device *dev)
> +{
> +	int ret;
> +	struct nvhost_device_data *pdata = platform_get_drvdata(dev);
> +
> +	ret = wait_event_timeout(pdata->idle_wq, is_module_idle(dev),
> +			ACM_SUSPEND_WAIT_FOR_IDLE_TIMEOUT);
> +	if (ret == 0) {
> +		dev_info(&dev->dev, "%s prevented suspend\n",
> +				dev_name(&dev->dev));
> +		return -EBUSY;
> +	}
> +

I'm not sure whether there is a race condition here. We wait until this
module is idle(refcount == 0), then try to powergate it next. But the
wait queue function "powerstate_down_handler" might already powergate
it. So we need to either  "cancel_delayed_work(&pdate->powerstate_down)"
before waiting the module to idle state or add some protection codes in
"to_state_powergated_locked".

> +	mutex_lock(&pdata->lock);
> +	cancel_delayed_work(&pdata->powerstate_down);
> +	to_state_powergated_locked(dev);
> +	mutex_unlock(&pdata->lock);
> +
> +	if (pdata->suspend_ndev)
> +		pdata->suspend_ndev(dev);
> +
> +	return 0;
> +}
> +
[...]
> +
> +int nvhost_syncpt_init(struct platform_device *dev,
> +		struct nvhost_syncpt *sp)
> +{
> +	int i;
> +	struct nvhost_master *host = syncpt_to_dev(sp);
> +	int err = 0;
> +
> +	/* Allocate structs for min, max and base values */
> +	sp->min_val = kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_pts(sp),
> +			GFP_KERNEL);
> +	sp->max_val = kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_pts(sp),
> +			GFP_KERNEL);
> +	sp->base_val = kzalloc(sizeof(u32) * nvhost_syncpt_nb_bases(sp),
> +			GFP_KERNEL);
> +	sp->lock_counts =
> +		kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_mlocks(sp),
> +			GFP_KERNEL);
> +
> +	if (!(sp->min_val && sp->max_val && sp->base_val && sp->lock_counts)) {
> +		/* frees happen in the deinit */
> +		err = -ENOMEM;
> +		goto fail;
> +	}
> +
> +	sp->kobj = kobject_create_and_add("syncpt", &dev->dev.kobj);
> +	if (!sp->kobj) {
> +		err = -EIO;
> +		goto fail;
> +	}
> +
> +	/* Allocate two attributes for each sync point: min and max */
> +	sp->syncpt_attrs = kzalloc(sizeof(*sp->syncpt_attrs)
> +			* nvhost_syncpt_nb_pts(sp) * 2, GFP_KERNEL);
> +	if (!sp->syncpt_attrs) {
> +		err = -ENOMEM;
> +		goto fail;
> +	}
> +
> +	/* Fill in the attributes */
> +	for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) {
> +		char name[MAX_SYNCPT_LENGTH];
> +		struct kobject *kobj;
> +		struct nvhost_syncpt_attr *min = &sp->syncpt_attrs[i*2];
> +		struct nvhost_syncpt_attr *max = &sp->syncpt_attrs[i*2+1];
> +
> +		/* Create one directory per sync point */
> +		snprintf(name, sizeof(name), "%d", i);
> +		kobj = kobject_create_and_add(name, sp->kobj);

Where do we "kobject_put" this kobj?

[...]
> +		if (!kobj) {
> +			err = -EIO;
> +			goto fail;
> +		}
> +
> +		min->id = i;
> +		min->host = host;
> +		min->attr.attr.name = min_name;
> +		min->attr.attr.mode = S_IRUGO;
> +		min->attr.show = syncpt_min_show;
> +		if (sysfs_create_file(kobj, &min->attr.attr)) {
> +			err = -EIO;
> +			goto fail;
> +		}
> +
> +		max->id = i;
> +		max->host = host;
> +		max->attr.attr.name = max_name;
> +		max->attr.attr.mode = S_IRUGO;
> +		max->attr.show = syncpt_max_show;
> +		if (sysfs_create_file(kobj, &max->attr.attr)) {
> +			err = -EIO;
> +			goto fail;
> +		}
> +	}
> +
> +	return err;
> +
> +fail:
> +	nvhost_syncpt_deinit(sp);
> +	return err;
> +}
> +
[...]
> +/* public host1x sync-point management APIs */
> +u32 host1x_syncpt_incr_max(u32 id, u32 incrs);
> +void host1x_syncpt_incr(u32 id);
> +u32 host1x_syncpt_read(u32 id);
> +
> +#endif
>

Terje Bergstrom Nov. 29, 2012, 10:21 a.m. UTC | #4

On 28.11.2012 23:23, Thierry Reding wrote:
> This could be problematic. Since drivers/video and drivers/gpu/drm are
> separate trees, this would entail a continuous burden on keeping both
> trees synchronized. While I realize that eventually it might be better
> to put the host1x driver in a separate place to accomodate for its use
> by other subsystems, I'm not sure moving it here right away is the best
> approach.

I understand your point, but I hope also that we'd end up with something
that can be used as basis for the downstream kernel to migrate to
upstream stack.

The key point here is to make the API between nvhost and tegradrm as
small and robust to changes as possible.

> I'm not sure drivers/video is the best location either. Perhaps
> drivers/bus would be better? Or maybe we need a new subdirectory for
> this kind of device.

This is a question I don't have an answer to. I'm perfectly ok moving it
wherever the public opinion leads it to.

> I think the general nowadays is to no longer use filenames in comments.

Ok, I hadn't noticed that. I'll remove them. It's redundant information
anyway.

>> +struct nvhost_chip_support *nvhost_chip_ops;
>> +
>> +struct nvhost_chip_support *nvhost_get_chip_ops(void)
>> +{
>> +     return nvhost_chip_ops;
>> +}
> 
> This seems like it should be more tightly coupled to the host1x device.
> And it shouldn't be a global variable.

Yeah, I will figure out a better way to handle the chip ops. I'm not too
happy with it. Give me a bit of time to come up with a good solution.

>> +struct output;
> 
> What's this? It doesn't seem to be used anywhere.

It's just a mistake. The struct is used in debug code, but not referred
to in this file so the forward declaration is not needed.

>> +struct nvhost_master;
> 
> Why do you suffix this with _master? The whole point of host1x is to be
> the "master" so you can just as well call it nvhost, right? Ideally
> you'd call it host1x, but I'm repeating myself. =)

Yes, the name is just a historic relict and I'm blind to them as I've
been staring at the code for so long. I think "host1x" would be a good
name for the struct.

>> +struct nvhost_syncpt_ops {
>> +     void (*reset)(struct nvhost_syncpt *, u32 id);
>> +     void (*reset_wait_base)(struct nvhost_syncpt *, u32 id);
>> +     void (*read_wait_base)(struct nvhost_syncpt *, u32 id);
>> +     u32 (*update_min)(struct nvhost_syncpt *, u32 id);
>> +     void (*cpu_incr)(struct nvhost_syncpt *, u32 id);
>> +     void (*debug)(struct nvhost_syncpt *);
>> +     const char * (*name)(struct nvhost_syncpt *, u32 id);
>> +};
> 
> Why are these even defined as ops structure? Tegra20 and Tegra30 seem to
> be compatible when it comes to handling syncpoints. I thought they would
> even be compatible in all other aspects as well, so why even have this?

Tegra20 and Tegra30 are compatible, but future chips are not. I was
hoping we would be ready in upstream kernel for future chips.

>> +#define syncpt_op()          (nvhost_get_chip_ops()->syncpt)
> 
> You really shouldn't be doing this, but rather use explicit accesses for
> these structures. If you're design doesn't scatter these definitions
> across several files then it isn't difficult to obtain the correct
> pointers and you don't need these "shortcuts".

Do you mean that I would move the ops to be part of f.ex. nvhost_syncpt
or nvhost_intr structs?

> This API looks odd. Should syncpoints not be considered as regular
> resources, much like interrupts? In that case it would be easier to
> abstract them away behind an opaque type. It looks like you already use
> the struct nvhost_syncpt to refer to the set of syncpoints associated
> with a host1x device.
> 
> How about you use nvhost/host1x_syncpt to refer to individual syncpoints
> instead. You could export an array of those from your host1x device and
> implement a basic resource allocation mechanism on top, similar to how
> other resources are handled in the kernel.
> 
> So a host1x client device could call host1x_request_syncpt() to allocate
> a syncpoint from it's host1x parent dynamically along with passing a
> name and a syncpoint handler to it.

That might work. I'll think about that - thanks.

>> +bool host1x_powered(struct platform_device *dev)
>> +{
> [...]
>> +}
>> +EXPORT_SYMBOL(host1x_powered);
>> +
>> +void host1x_busy(struct platform_device *dev)
>> +{
> [...]
>> +}
>> +EXPORT_SYMBOL(host1x_busy);
>> +
>> +void host1x_idle(struct platform_device *dev)
>> +{
> [...]
>> +}
>> +EXPORT_SYMBOL(host1x_idle);
> 
> These look like a reimplementation of the runtime power-management
> framework.

Yes, we at some point tried to move to use runtime PM. The first attempt
was thwarted by runtime PM and system suspend conflicting with each
other. I believe this is pretty much fixed in later versions of kernel.

Also, the problem was that runtime PM doesn't support multiple power
states. In downstream kernel, we support clock gating and power gating.
When we moved to runtime PM and implemented power gating on top of that,
we ended up with more code than just using the current ACM code.

I have a developer starting to look into how we could take runtime PM
again into use with proper power gating support. It'll take a while to
get that right. It might be best to rip the dynamic power management out
from this patch set, and introduce it later when we have a proper
runtime PM solution.

I'll skip the other comments about ACM because of this.

>> diff --git a/drivers/video/tegra/host/host1x/host1x.c b/drivers/video/tegra/host/host1x/host1x.c
> [...]
>> +struct nvhost_master *nvhost;
> Bad habbit. I know that this is a popular shortcut. However this also
> leads to very bad designs because you're allowed to reuse this pointer
> from wherever you like.
> 
> When I wrote the tegra-drm code I explicitly made sure to not use any
> such global variable. In the end it forces you to clean up the driver
> design.
> 
> As a bonus you automatically get support for any number of host1x
> devices on the same SoC. Now you will probably tell me that this is
> never going to happen. People also used to think that a computers would
> never use more than a single CPU...

I think this might get cleaned up at the same time with cleaning up the
auxdata/chip_ops design. We used to have this struct set as driver
private data, but as we started using that for another purpose, we moved
this variable out.

>> +
>> +static void nvhost_free_resources(struct nvhost_master *host)
>> +{
>> +}
> 
> This should be removed since it's empty.

True. I wonder how that happened - there was content since recently, but
I guess I deleted the code without noticing that the function needs to
go, too.

>> +     regs = platform_get_resource(dev, IORESOURCE_MEM, 0);
>> +     intr0 = platform_get_resource(dev, IORESOURCE_IRQ, 0);
>> +     intr1 = platform_get_resource(dev, IORESOURCE_IRQ, 1);
>> +
>> +     if (!regs || !intr0 || !intr1) {
> 
> I prefer to have these checked for explicitly, one by one for
> readability and potentially more useful diagnostics.

Can do.

> Also you should be using platform_get_irq() for interrupts. Furthermore
> the host1x DT node (and the TRM) name the interrupts "syncpt" and
> "general", so maybe those would be more useful variable names than
> "intr0" and "intr1".
> 
> But since you don't use them anyway they shouldn't be part of this
> patch.

True. I might also as well delete the general interrupt altogether, as
we don't use it for any real purpose.

>> +     /* Copy host1x parameters. The private_data gets replaced
>> +      * by nvhost_master later */
> 
> Multiline comments should be in this format:
> 
>         /*
>          * foo
>          */

Ok.

>> +     host->aperture = devm_request_and_ioremap(&dev->dev, regs);
>> +     if (!host->aperture) {
> 
> aperture is confusing as it is typically used for GTT-type memory
> regions, so it may be mistaken for the GART found on Tegra 2. Why not
> call it "regs" instead?

Can do.

> 
>> +             dev_err(&dev->dev, "failed to remap host registers\n");
> 
> This is unnecessary. devm_request_and_ioremap() already prints an error
> message on failure.

I'll remove that, thanks.

> 
>> +     for (i = 0; i < pdata->num_clks; i++)
>> +             clk_prepare_enable(pdata->clk[i]);
>> +     nvhost_syncpt_reset(&host->syncpt);
>> +     for (i = 0; i < pdata->num_clks; i++)
>> +             clk_disable_unprepare(pdata->clk[i]);
> 
> Stephen already hinted at this when discussing the AUXDATA. You should
> explicitly request the clocks.

I'm not too happy about that idea. The clock management code is generic
for all modules, and that's why it's driven by a data structure. Now
Stephen and you ask me to unroll the loops and make copies of the code
to facilitate different modules and different SoCs.

>> +static int __exit nvhost_remove(struct platform_device *dev)
> 
> This should really be __devexit to allow the driver to be built as a
> module. However, __dev* are deprecated and in the process of being
> removed so you can just drop __exit as well.
>> +static struct of_device_id host1x_match[] __devinitdata = {
>
> __devinitdata can be dropped.
>> +     { .compatible = "nvidia,tegra20-host1x", },
>> +     { .compatible = "nvidia,tegra30-host1x", },
>> +     { },
>> +};
>> +
>> +static struct platform_driver platform_driver = {
>> +     .probe = nvhost_probe,
>> +     .remove = __exit_p(nvhost_remove),
>
> __exit_p also.

Ok.

>> +             .of_match_table = of_match_ptr(host1x_match),
> 
> No need for of_match_ptr().

Will remove.

>> +module_init(nvhost_mod_init);
>> +module_exit(nvhost_mod_exit);
> 
> Use module_platform_driver().

Ok.

> 
>> diff --git a/drivers/video/tegra/host/host1x/host1x.h b/drivers/video/tegra/host/host1x/host1x.h
> [...]
>> +#define TRACE_MAX_LENGTH     128U
>> +#define IFACE_NAME           "nvhost"
> 
> None of these seem to be used.

Will remove.

>> +static inline void *nvhost_get_private_data(struct platform_device *_dev)
>> +{
>> +     struct nvhost_device_data *pdata =
>> +             (struct nvhost_device_data *)platform_get_drvdata(_dev);
>> +     WARN_ON(!pdata);
>> +     return (pdata && pdata->private_data) ? pdata->private_data : NULL;
>> +}
>> +
>> +static inline void nvhost_set_private_data(struct platform_device *_dev,
>> +     void *priv_data)
>> +{
>> +     struct nvhost_device_data *pdata =
>> +             (struct nvhost_device_data *)platform_get_drvdata(_dev);
>> +     WARN_ON(!pdata);
>> +     if (pdata)
>> +             pdata->private_data = priv_data;
>> +}
> 
> You should need none of these. Instead put all the data you need into
> you struct host1x and associate that with the platform device using
> platform_set_drvdata().

I need to actually find a way to do this for both host1x itself, and the
2D module. But as said, I'll try to remove the auxdata and come up with
something better.

>> +static inline
>> +struct nvhost_master *nvhost_get_host(struct platform_device *_dev)
>> +{
>> +     struct platform_device *pdev;
>> +
>> +     if (_dev->dev.parent) {
>> +             pdev = to_platform_device(_dev->dev.parent);
>> +             return nvhost_get_private_data(pdev);
>> +     } else
>> +             return nvhost_get_private_data(_dev);
>> +}
>> +
>> +static inline
>> +struct platform_device *nvhost_get_parent(struct platform_device *_dev)
>> +{
>> +     return _dev->dev.parent ? to_platform_device(_dev->dev.parent) : NULL;
>> +}
> 
> These don't seem to be used.

nvhost_get_host() is used in a subsequent patch, but getting parent
doesn't seem to be.

> Usually you don't keep separate variables for subregions. This can
> equally well be done with just adding a corresponding offset.

Hmm, ok, I could do that, but it just sounds logical to have only one
piece of code that finds the sync reg base. I don't really understand
why it needs to be duplicate for every access.

>> +static void host1x_syncpt_debug(struct nvhost_syncpt *sp)
>> +{
>> +     u32 i;
>> +     for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) {
>> +             u32 max = nvhost_syncpt_read_max(sp, i);
>> +             u32 min = nvhost_syncpt_update_min(sp, i);
>> +             if (!max && !min)
>> +                     continue;
>> +             dev_info(&syncpt_to_dev(sp)->dev->dev,
>> +                     "id %d (%s) min %d max %d\n",
>> +                     i, syncpt_op().name(sp, i),
>> +                     min, max);
>> +
>> +     }
>> +
>> +     for (i = 0; i < nvhost_syncpt_nb_bases(sp); i++) {
>> +             u32 base_val;
>> +             host1x_syncpt_read_wait_base(sp, i);
>> +             base_val = sp->base_val[i];
>> +             if (base_val)
>> +                     dev_info(&syncpt_to_dev(sp)->dev->dev,
>> +                                     "waitbase id %d val %d\n",
>> +                                     i, base_val);
>> +
>> +     }
>> +}
> 
> This should probably be integrated with debugfs.

I could move this to debug.c, but it's debugging aid when a command
stream is misbehaving and it spews this to UART when sync point wait is
timing out. So not debugfs stuff.

>> diff --git a/drivers/video/tegra/host/host1x/hw_host1x01_sync.h b/drivers/video/tegra/host/host1x/hw_host1x01_sync.h
> 
> Autogenerated files are generally not acceptable. And I already
> mentioned before that you should be using #define instead of static
> inline functions for register and bit definitions.

What's the root cause for autogenerated files not being acceptable? I'm
autogenerating them from definitions I get from hardware, which makes
the results reliable.

I like static inline because I get the benefit of compiler type
checking, and gcov shows me which register definitions have been used in
different tests.

#defines are always messy and I pretty much hate them. But if the
general request is to use #define's, even though I don't agree, I can
accommodate. It's simple to write a sed script to do the conversion.

> This is a very funky way of creating sysfs attributes. What you probably
> want here are device attributes. See Documentation/filesystems/sysfs.txt
> and include/linux/sysfs.h.

Thanks for the pointers, looks like device attributes could be used.

>> +bool nvhost_syncpt_is_expired(
>> +     struct nvhost_syncpt *sp,
>> +     u32 id,
>> +     u32 thresh)
>> +{
>> +     u32 current_val;
>> +     u32 future_val;
>> +     smp_rmb();
> 
> What do you need a read memory barrier for?

I'll test without that. I can't see any valid reason, and I have a
couple of other similar problems.

>> +/* Displays the current value of the sync point via sysfs */
>> +static ssize_t syncpt_min_show(struct kobject *kobj,
>> +             struct kobj_attribute *attr, char *buf)
>> +{
>> +     struct nvhost_syncpt_attr *syncpt_attr =
>> +             container_of(attr, struct nvhost_syncpt_attr, attr);
>> +
>> +     return snprintf(buf, PAGE_SIZE, "%u",
>> +                     nvhost_syncpt_read(&syncpt_attr->host->syncpt,
>> +                             syncpt_attr->id));
>> +}
>> +
>> +static ssize_t syncpt_max_show(struct kobject *kobj,
>> +             struct kobj_attribute *attr, char *buf)
>> +{
>> +     struct nvhost_syncpt_attr *syncpt_attr =
>> +             container_of(attr, struct nvhost_syncpt_attr, attr);
>> +
>> +     return snprintf(buf, PAGE_SIZE, "%u",
>> +                     nvhost_syncpt_read_max(&syncpt_attr->host->syncpt,
>> +                             syncpt_attr->id));
>> +}
> 
> This looks like it belongs in debugfs.

This is actually the only interface to read the max value to user space,
which can be useful for doing some comparisons that take wrapping into
account. But we could just add IOCTLs and remove the sysfs entries.

>> diff --git a/include/linux/nvhost.h b/include/linux/nvhost.h
> [...]
>> +struct host1x_device_info {
>> +     int             nb_channels;    /* host1x: num channels supported */
>> +     int             nb_pts;         /* host1x: num syncpoints supported */
>> +     int             nb_bases;       /* host1x: num syncpoints supported */
>> +     u32             client_managed; /* host1x: client managed syncpts */
>> +     int             nb_mlocks;      /* host1x: number of mlocks */
>> +     const char      **syncpt_names; /* names of sync points */
>> +};
>> +
>> +struct nvhost_device_data {
>> +     int             version;        /* ip version number of device */
>> +     int             id;             /* Separates clients of same hw */
>> +     int             index;          /* Hardware channel number */
>> +     void __iomem    *aperture;      /* Iomem mapped to kernel */
>> +
>> +     u32             syncpts;        /* Bitfield of sync points used */
>> +     u32             modulemutexes;  /* Bit field of module mutexes */
>> +
>> +     u32             class;          /* Device class */
>> +     bool            serialize;      /* Serialize submits in the channel */
>> +
>> +     int             powergate_ids[NVHOST_MODULE_MAX_POWERGATE_IDS];
>> +     bool            can_powergate;  /* True if module can be power gated */
>> +     int             clockgate_delay;/* Delay before clock gated */
>> +     int             powergate_delay;/* Delay before power gated */
>> +     struct nvhost_clock clocks[NVHOST_MODULE_MAX_CLOCKS];/* Clock names */
>> +
>> +     struct delayed_work powerstate_down;/* Power state management */
>> +     int             num_clks;       /* Number of clocks opened for dev */
>> +     struct clk      *clk[NVHOST_MODULE_MAX_CLOCKS];
>> +     struct mutex    lock;           /* Power management lock */
>> +     int             powerstate;     /* Current power state */
>> +     int             refcount;       /* Number of tasks active */
>> +     wait_queue_head_t idle_wq;      /* Work queue for idle */
>> +
>> +     struct nvhost_channel *channel; /* Channel assigned for the module */
>> +     struct kobject *power_kobj;     /* kobj to hold power sysfs entries */
>> +     struct nvhost_device_power_attr *power_attrib;  /* sysfs attributes */
>> +     struct dentry *debugfs;         /* debugfs directory */
>> +
>> +     void *private_data;             /* private platform data */
>> +     struct platform_device *pdev;   /* owner platform_device */
>> +
>> +     /* Finalize power on. Can be used for context restore. */
>> +     void (*finalize_poweron)(struct platform_device *dev);
>> +
>> +     /* Device is busy. */
>> +     void (*busy)(struct platform_device *);
>> +
>> +     /* Device is idle. */
>> +     void (*idle)(struct platform_device *);
>> +
>> +     /* Device is going to be suspended */
>> +     void (*suspend_ndev)(struct platform_device *);
>> +
>> +     /* Device is initialized */
>> +     void (*init)(struct platform_device *dev);
>> +
>> +     /* Device is de-initialized. */
>> +     void (*deinit)(struct platform_device *dev);
>> +
>> +     /* Preparing for power off. Used for context save. */
>> +     int (*prepare_poweroff)(struct platform_device *dev);
>> +
>> +     /* Clock gating callbacks */
>> +     int (*prepare_clockoff)(struct platform_device *dev);
>> +     void (*finalize_clockon)(struct platform_device *dev);
>> +};
> 
> A lot of this can be removed if you use existing infrastructure and
> simplify the design a bit. Most of it can probably move into the main
> struct host1x to avoid needless indirections that make the code hard to
> follow and maintain.

Actually, this struct is generic for host1x and host1x clients, so they
cannot be moved to host1x. I do also realize that I'm not using them in
the patch sets I sent for 2D.

Terje

Thierry Reding Nov. 29, 2012, 11:47 a.m. UTC | #5

On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström wrote:
> On 28.11.2012 23:23, Thierry Reding wrote:
> > This could be problematic. Since drivers/video and drivers/gpu/drm are
> > separate trees, this would entail a continuous burden on keeping both
> > trees synchronized. While I realize that eventually it might be better
> > to put the host1x driver in a separate place to accomodate for its use
> > by other subsystems, I'm not sure moving it here right away is the best
> > approach.
> 
> I understand your point, but I hope also that we'd end up with something
> that can be used as basis for the downstream kernel to migrate to
> upstream stack.
> 
> The key point here is to make the API between nvhost and tegradrm as
> small and robust to changes as possible.

I agree. But I also fear that there will be changes eventually and
having both go in via different tree requires those trees to be merged
in a specific order to avoid breakage should the API change. This will
be particularly ugly in linux-next.

That's why I explicitly proposed to take this into drivers/gpu/drm/tegra
for the time being, until we can be reasonably sure that the API is
fixed. Then I'm fine with moving it wherever seems the best fit. Even
then there might be the occasional dependency, but they should get fewer
and fewer as the code matures.

> >> +struct nvhost_syncpt_ops {
> >> +     void (*reset)(struct nvhost_syncpt *, u32 id);
> >> +     void (*reset_wait_base)(struct nvhost_syncpt *, u32 id);
> >> +     void (*read_wait_base)(struct nvhost_syncpt *, u32 id);
> >> +     u32 (*update_min)(struct nvhost_syncpt *, u32 id);
> >> +     void (*cpu_incr)(struct nvhost_syncpt *, u32 id);
> >> +     void (*debug)(struct nvhost_syncpt *);
> >> +     const char * (*name)(struct nvhost_syncpt *, u32 id);
> >> +};
> > 
> > Why are these even defined as ops structure? Tegra20 and Tegra30 seem to
> > be compatible when it comes to handling syncpoints. I thought they would
> > even be compatible in all other aspects as well, so why even have this?
> 
> Tegra20 and Tegra30 are compatible, but future chips are not. I was
> hoping we would be ready in upstream kernel for future chips.

I think we should ignore that problem for now. Generally planning for
any possible combination of incompatibilities leads to overgeneralized
designs that require precisely these kinds of indirections.

Once some documentation for Tegra 40 materializes we can start thinking
about how to encapsulate the incompatible code.

> >> +#define syncpt_op()          (nvhost_get_chip_ops()->syncpt)
> > 
> > You really shouldn't be doing this, but rather use explicit accesses for
> > these structures. If you're design doesn't scatter these definitions
> > across several files then it isn't difficult to obtain the correct
> > pointers and you don't need these "shortcuts".
> 
> Do you mean that I would move the ops to be part of f.ex. nvhost_syncpt
> or nvhost_intr structs?

Not quite. What I'm saying is that unless there is a reason to
encapsulate the functions into an ops structure (for instance because of
incompatibilities across chip generations) they shouldn't be pointers in
a struct at all.

For that matter I don't think you need the nvhost_syncpt and nvhost_intr
structures either.

> >> +bool host1x_powered(struct platform_device *dev)
> >> +{
> > [...]
> >> +}
> >> +EXPORT_SYMBOL(host1x_powered);
> >> +
> >> +void host1x_busy(struct platform_device *dev)
> >> +{
> > [...]
> >> +}
> >> +EXPORT_SYMBOL(host1x_busy);
> >> +
> >> +void host1x_idle(struct platform_device *dev)
> >> +{
> > [...]
> >> +}
> >> +EXPORT_SYMBOL(host1x_idle);
> > 
> > These look like a reimplementation of the runtime power-management
> > framework.
> 
> Yes, we at some point tried to move to use runtime PM. The first attempt
> was thwarted by runtime PM and system suspend conflicting with each
> other. I believe this is pretty much fixed in later versions of kernel.
> 
> Also, the problem was that runtime PM doesn't support multiple power
> states. In downstream kernel, we support clock gating and power gating.
> When we moved to runtime PM and implemented power gating on top of that,
> we ended up with more code than just using the current ACM code.
> 
> I have a developer starting to look into how we could take runtime PM
> again into use with proper power gating support. It'll take a while to
> get that right. It might be best to rip the dynamic power management out
> from this patch set, and introduce it later when we have a proper
> runtime PM solution.

Okay, sounds like a plan. Even if it turns out that the current runtime
PM implementation doesn't provide every functionality that you need, we
should try to get these changes into the existing frameworks instead of
copying large chunks of code.

> >> +static void nvhost_free_resources(struct nvhost_master *host)
> >> +{
> >> +}
> > 
> > This should be removed since it's empty.
> 
> True. I wonder how that happened - there was content since recently, but
> I guess I deleted the code without noticing that the function needs to
> go, too.

I noticed that it was filled with content in one of the subsequent
patches. Depending on how this gets merged eventually you could postpone
adding the function until the later patch. But perhaps once the code has
been properly reviewed we can just squash the patches again. We'll see.

> > Also you should be using platform_get_irq() for interrupts. Furthermore
> > the host1x DT node (and the TRM) name the interrupts "syncpt" and
> > "general", so maybe those would be more useful variable names than
> > "intr0" and "intr1".
> > 
> > But since you don't use them anyway they shouldn't be part of this
> > patch.
> 
> True. I might also as well delete the general interrupt altogether, as
> we don't use it for any real purpose.

I think it might still be useful for diagnostics. It seems to be used
when writes time out. That could still be helpful information when
debugging problems.

> >> +     for (i = 0; i < pdata->num_clks; i++)
> >> +             clk_prepare_enable(pdata->clk[i]);
> >> +     nvhost_syncpt_reset(&host->syncpt);
> >> +     for (i = 0; i < pdata->num_clks; i++)
> >> +             clk_disable_unprepare(pdata->clk[i]);
> > 
> > Stephen already hinted at this when discussing the AUXDATA. You should
> > explicitly request the clocks.
> 
> I'm not too happy about that idea. The clock management code is generic
> for all modules, and that's why it's driven by a data structure. Now
> Stephen and you ask me to unroll the loops and make copies of the code
> to facilitate different modules and different SoCs.

Making this generic for all modules may not be what we want as it
doesn't allow devices to handle things themselves if necessary. Clock
management is just part of the boiler plate that every driver is
supposed to cope with. Also the number of clocks is usually not higher
than 2 or 3, so the pain is manageable. =)

Furthermore doing this in loops may not work for all modules. Some may
require additional delays between enabling the clocks, others may be
able to selectively disable one clock but not the other(s).

> >> +static inline void *nvhost_get_private_data(struct platform_device *_dev)
> >> +{
> >> +     struct nvhost_device_data *pdata =
> >> +             (struct nvhost_device_data *)platform_get_drvdata(_dev);
> >> +     WARN_ON(!pdata);
> >> +     return (pdata && pdata->private_data) ? pdata->private_data : NULL;
> >> +}
> >> +
> >> +static inline void nvhost_set_private_data(struct platform_device *_dev,
> >> +     void *priv_data)
> >> +{
> >> +     struct nvhost_device_data *pdata =
> >> +             (struct nvhost_device_data *)platform_get_drvdata(_dev);
> >> +     WARN_ON(!pdata);
> >> +     if (pdata)
> >> +             pdata->private_data = priv_data;
> >> +}
> > 
> > You should need none of these. Instead put all the data you need into
> > you struct host1x and associate that with the platform device using
> > platform_set_drvdata().
> 
> I need to actually find a way to do this for both host1x itself, and the
> 2D module. But as said, I'll try to remove the auxdata and come up with
> something better.

The existing host1x and DRM code could serve as an example since I
explicitly wrote them to behave properly.

> >> +static inline
> >> +struct nvhost_master *nvhost_get_host(struct platform_device *_dev)
> >> +{
> >> +     struct platform_device *pdev;
> >> +
> >> +     if (_dev->dev.parent) {
> >> +             pdev = to_platform_device(_dev->dev.parent);
> >> +             return nvhost_get_private_data(pdev);
> >> +     } else
> >> +             return nvhost_get_private_data(_dev);
> >> +}
> >> +
> >> +static inline
> >> +struct platform_device *nvhost_get_parent(struct platform_device *_dev)
> >> +{
> >> +     return _dev->dev.parent ? to_platform_device(_dev->dev.parent) : NULL;
> >> +}
> > 
> > These don't seem to be used.
> 
> nvhost_get_host() is used in a subsequent patch, but getting parent
> doesn't seem to be.

Again, if you look at the existing tegra-drm code, the client modules
already use something a bit more explicit to obtain a reference to the
host1x:

	struct host1x *host1x = dev_get_drvdata(pdev->dev.parent);

The good thing about it is that it very clearly says where the host1x
pointer should be coming from. Explicitness is good.

> > Usually you don't keep separate variables for subregions. This can
> > equally well be done with just adding a corresponding offset.
> 
> Hmm, ok, I could do that, but it just sounds logical to have only one
> piece of code that finds the sync reg base. I don't really understand
> why it needs to be duplicate for every access.

You wouldn't actually be duplicating it. Rather you'd just add another
offset. But I commented on this more explicitly in a reply to one of the
other patches.

> >> +static void host1x_syncpt_debug(struct nvhost_syncpt *sp)
> >> +{
> >> +     u32 i;
> >> +     for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) {
> >> +             u32 max = nvhost_syncpt_read_max(sp, i);
> >> +             u32 min = nvhost_syncpt_update_min(sp, i);
> >> +             if (!max && !min)
> >> +                     continue;
> >> +             dev_info(&syncpt_to_dev(sp)->dev->dev,
> >> +                     "id %d (%s) min %d max %d\n",
> >> +                     i, syncpt_op().name(sp, i),
> >> +                     min, max);
> >> +
> >> +     }
> >> +
> >> +     for (i = 0; i < nvhost_syncpt_nb_bases(sp); i++) {
> >> +             u32 base_val;
> >> +             host1x_syncpt_read_wait_base(sp, i);
> >> +             base_val = sp->base_val[i];
> >> +             if (base_val)
> >> +                     dev_info(&syncpt_to_dev(sp)->dev->dev,
> >> +                                     "waitbase id %d val %d\n",
> >> +                                     i, base_val);
> >> +
> >> +     }
> >> +}
> > 
> > This should probably be integrated with debugfs.
> 
> I could move this to debug.c, but it's debugging aid when a command
> stream is misbehaving and it spews this to UART when sync point wait is
> timing out. So not debugfs stuff.

Okay, in that case it should stay in. Perhaps convert dev_info() to
dev_dbg(). Perhaps wrapping it in some #ifdef CONFIG_TEGRA_HOST1X_DEBUG
guards would also be useful. Maybe not.

> >> diff --git a/drivers/video/tegra/host/host1x/hw_host1x01_sync.h b/drivers/video/tegra/host/host1x/hw_host1x01_sync.h
> > 
> > Autogenerated files are generally not acceptable. And I already
> > mentioned before that you should be using #define instead of static
> > inline functions for register and bit definitions.
> 
> What's the root cause for autogenerated files not being acceptable? I'm
> autogenerating them from definitions I get from hardware, which makes
> the results reliable.

The problem is not with autogenerated files in general. The means by
which they are generated are less important. However, autogenerated
files often contain a lot of unneeded definitions and contain things
such as "autogenerated - do not edit" lines.

So generally if you generate the content using some scripts to make sure
it corresponds to what engineering gave you, that's okay as long as you
make sure it has the correct form and doesn't contain any cruft.

> I like static inline because I get the benefit of compiler type
> checking, and gcov shows me which register definitions have been used in
> different tests.

Type checking shouldn't be necessary for simple defines. And I wasn't
aware that you could get the Linux kernel to write out data to be fed to
gcov.

> #defines are always messy and I pretty much hate them. But if the
> general request is to use #define's, even though I don't agree, I can
> accommodate. It's simple to write a sed script to do the conversion.

There are a lot of opportunities to abuse #defines but they are harmless
for register definitions. The Linux kernel is full of them and I haven't
yet seen any code that uses static inline functions for this purpose.

What you need to consider as well is that many people that work with the
Linux kernel expect code to be in a certain style. Register accesses of
the form

	writel(value, base + OFFSET);

are very common and expected to look a certain way, so if you write code
that doesn't comply with these guidelines you make it extra hard for
people to read the code. And that'll cost extra time, which people don't
usually have in excess.

> >> +/* Displays the current value of the sync point via sysfs */
> >> +static ssize_t syncpt_min_show(struct kobject *kobj,
> >> +             struct kobj_attribute *attr, char *buf)
> >> +{
> >> +     struct nvhost_syncpt_attr *syncpt_attr =
> >> +             container_of(attr, struct nvhost_syncpt_attr, attr);
> >> +
> >> +     return snprintf(buf, PAGE_SIZE, "%u",
> >> +                     nvhost_syncpt_read(&syncpt_attr->host->syncpt,
> >> +                             syncpt_attr->id));
> >> +}
> >> +
> >> +static ssize_t syncpt_max_show(struct kobject *kobj,
> >> +             struct kobj_attribute *attr, char *buf)
> >> +{
> >> +     struct nvhost_syncpt_attr *syncpt_attr =
> >> +             container_of(attr, struct nvhost_syncpt_attr, attr);
> >> +
> >> +     return snprintf(buf, PAGE_SIZE, "%u",
> >> +                     nvhost_syncpt_read_max(&syncpt_attr->host->syncpt,
> >> +                             syncpt_attr->id));
> >> +}
> > 
> > This looks like it belongs in debugfs.
> 
> This is actually the only interface to read the max value to user space,
> which can be useful for doing some comparisons that take wrapping into
> account. But we could just add IOCTLs and remove the sysfs entries.

Maybe you can explain the usefulness of this some more. Why would it be
easier to look at them in sysfs than in debugfs? You could be providing
a simple list of syncpoints along with min/max, name, requested status,
etc. in debugfs and it should be as easy to parse for both humans and
machines as sysfs. I don't think IOCTLs would be any gain as they tend
to have higher ABI stability requirements than debugfs (which doesn't
have very strong requirements) or sysfs (which is often considered as a
public ABI as well and therefore needs to be stable).

> >> diff --git a/include/linux/nvhost.h b/include/linux/nvhost.h
> > [...]
> >> +struct host1x_device_info {
> >> +     int             nb_channels;    /* host1x: num channels supported */
> >> +     int             nb_pts;         /* host1x: num syncpoints supported */
> >> +     int             nb_bases;       /* host1x: num syncpoints supported */
> >> +     u32             client_managed; /* host1x: client managed syncpts */
> >> +     int             nb_mlocks;      /* host1x: number of mlocks */
> >> +     const char      **syncpt_names; /* names of sync points */
> >> +};
> >> +
> >> +struct nvhost_device_data {
> >> +     int             version;        /* ip version number of device */
> >> +     int             id;             /* Separates clients of same hw */
> >> +     int             index;          /* Hardware channel number */
> >> +     void __iomem    *aperture;      /* Iomem mapped to kernel */
> >> +
> >> +     u32             syncpts;        /* Bitfield of sync points used */
> >> +     u32             modulemutexes;  /* Bit field of module mutexes */
> >> +
> >> +     u32             class;          /* Device class */
> >> +     bool            serialize;      /* Serialize submits in the channel */
> >> +
> >> +     int             powergate_ids[NVHOST_MODULE_MAX_POWERGATE_IDS];
> >> +     bool            can_powergate;  /* True if module can be power gated */
> >> +     int             clockgate_delay;/* Delay before clock gated */
> >> +     int             powergate_delay;/* Delay before power gated */
> >> +     struct nvhost_clock clocks[NVHOST_MODULE_MAX_CLOCKS];/* Clock names */
> >> +
> >> +     struct delayed_work powerstate_down;/* Power state management */
> >> +     int             num_clks;       /* Number of clocks opened for dev */
> >> +     struct clk      *clk[NVHOST_MODULE_MAX_CLOCKS];
> >> +     struct mutex    lock;           /* Power management lock */
> >> +     int             powerstate;     /* Current power state */
> >> +     int             refcount;       /* Number of tasks active */
> >> +     wait_queue_head_t idle_wq;      /* Work queue for idle */
> >> +
> >> +     struct nvhost_channel *channel; /* Channel assigned for the module */
> >> +     struct kobject *power_kobj;     /* kobj to hold power sysfs entries */
> >> +     struct nvhost_device_power_attr *power_attrib;  /* sysfs attributes */
> >> +     struct dentry *debugfs;         /* debugfs directory */
> >> +
> >> +     void *private_data;             /* private platform data */
> >> +     struct platform_device *pdev;   /* owner platform_device */
> >> +
> >> +     /* Finalize power on. Can be used for context restore. */
> >> +     void (*finalize_poweron)(struct platform_device *dev);
> >> +
> >> +     /* Device is busy. */
> >> +     void (*busy)(struct platform_device *);
> >> +
> >> +     /* Device is idle. */
> >> +     void (*idle)(struct platform_device *);
> >> +
> >> +     /* Device is going to be suspended */
> >> +     void (*suspend_ndev)(struct platform_device *);
> >> +
> >> +     /* Device is initialized */
> >> +     void (*init)(struct platform_device *dev);
> >> +
> >> +     /* Device is de-initialized. */
> >> +     void (*deinit)(struct platform_device *dev);
> >> +
> >> +     /* Preparing for power off. Used for context save. */
> >> +     int (*prepare_poweroff)(struct platform_device *dev);
> >> +
> >> +     /* Clock gating callbacks */
> >> +     int (*prepare_clockoff)(struct platform_device *dev);
> >> +     void (*finalize_clockon)(struct platform_device *dev);
> >> +};
> > 
> > A lot of this can be removed if you use existing infrastructure and
> > simplify the design a bit. Most of it can probably move into the main
> > struct host1x to avoid needless indirections that make the code hard to
> > follow and maintain.
> 
> Actually, this struct is generic for host1x and host1x clients, so they
> cannot be moved to host1x. I do also realize that I'm not using them in
> the patch sets I sent for 2D.

I've said this before, and I think that this tries to be overly generic.
Display controllers for instance work quite well without an attached
nvhost_channel.

Thierry

Stephen Warren Nov. 29, 2012, 6:34 p.m. UTC | #6

On 11/29/2012 03:21 AM, Terje Bergström wrote:
> On 28.11.2012 23:23, Thierry Reding wrote:
...
>>> +     regs = platform_get_resource(dev, IORESOURCE_MEM, 0);
>>> +     intr0 = platform_get_resource(dev, IORESOURCE_IRQ, 0);
>>> +     intr1 = platform_get_resource(dev, IORESOURCE_IRQ, 1);
>>> +
>>> +     if (!regs || !intr0 || !intr1) {
>>
>> I prefer to have these checked for explicitly, one by one for
>> readability and potentially more useful diagnostics.
> 
> Can do.
> 
>> Also you should be using platform_get_irq() for interrupts. Furthermore
>> the host1x DT node (and the TRM) name the interrupts "syncpt" and
>> "general", so maybe those would be more useful variable names than
>> "intr0" and "intr1".
>>
>> But since you don't use them anyway they shouldn't be part of this
>> patch.
> 
> True. I might also as well delete the general interrupt altogether, as
> we don't use it for any real purpose.

Do make sure the interrupts still are part of the DT binding though, so
that the binding fully describes the HW, and the interrupt is available
to retrieve if we ever do use it in the future.

>>> +     for (i = 0; i < pdata->num_clks; i++)
>>> +             clk_prepare_enable(pdata->clk[i]);
>>> +     nvhost_syncpt_reset(&host->syncpt);
>>> +     for (i = 0; i < pdata->num_clks; i++)
>>> +             clk_disable_unprepare(pdata->clk[i]);
>>
>> Stephen already hinted at this when discussing the AUXDATA. You should
>> explicitly request the clocks.
> 
> I'm not too happy about that idea. The clock management code is generic
> for all modules, and that's why it's driven by a data structure. Now
> Stephen and you ask me to unroll the loops and make copies of the code
> to facilitate different modules and different SoCs.

You can still create tables of clocks inside the driver and loop over
them. So, loop unrolling isn't related to my comments at least. It's
just that clk_get() shouldn't take its parameters from platform data.

But if these are clocks for (arbitrary) child modules (that may or may
not exist dynamically), why aren't the drivers for the child modules
managing them?

Stephen Warren Nov. 29, 2012, 6:38 p.m. UTC | #7

On 11/29/2012 04:47 AM, Thierry Reding wrote:
> On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström wrote:
>> On 28.11.2012 23:23, Thierry Reding wrote:
>>> This could be problematic. Since drivers/video and
>>> drivers/gpu/drm are separate trees, this would entail a
>>> continuous burden on keeping both trees synchronized. While I
>>> realize that eventually it might be better to put the host1x
>>> driver in a separate place to accomodate for its use by other
>>> subsystems, I'm not sure moving it here right away is the best 
>>> approach.
>> 
>> I understand your point, but I hope also that we'd end up with
>> something that can be used as basis for the downstream kernel to
>> migrate to upstream stack.
>> 
>> The key point here is to make the API between nvhost and tegradrm
>> as small and robust to changes as possible.
> 
> I agree. But I also fear that there will be changes eventually and 
> having both go in via different tree requires those trees to be
> merged in a specific order to avoid breakage should the API change.
> This will be particularly ugly in linux-next.
> 
> That's why I explicitly proposed to take this into
> drivers/gpu/drm/tegra for the time being, until we can be
> reasonably sure that the API is fixed. Then I'm fine with moving it
> wherever seems the best fit. Even then there might be the
> occasional dependency, but they should get fewer and fewer as the
> code matures.

It is acceptable for one maintainer to ack patches, and another
maintainer to merge a series that touches both "their own" code and
code owned by another tree. This should of course only be needed when
inter-module APIs change; changes to code within a module shouldn't
require this.

Thierry Reding Nov. 30, 2012, 6:52 a.m. UTC | #8

On Thu, Nov 29, 2012 at 11:38:11AM -0700, Stephen Warren wrote:
> On 11/29/2012 04:47 AM, Thierry Reding wrote:
> > On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström wrote:
> >> On 28.11.2012 23:23, Thierry Reding wrote:
> >>> This could be problematic. Since drivers/video and
> >>> drivers/gpu/drm are separate trees, this would entail a
> >>> continuous burden on keeping both trees synchronized. While I
> >>> realize that eventually it might be better to put the host1x
> >>> driver in a separate place to accomodate for its use by other
> >>> subsystems, I'm not sure moving it here right away is the best 
> >>> approach.
> >> 
> >> I understand your point, but I hope also that we'd end up with
> >> something that can be used as basis for the downstream kernel to
> >> migrate to upstream stack.
> >> 
> >> The key point here is to make the API between nvhost and tegradrm
> >> as small and robust to changes as possible.
> > 
> > I agree. But I also fear that there will be changes eventually and 
> > having both go in via different tree requires those trees to be
> > merged in a specific order to avoid breakage should the API change.
> > This will be particularly ugly in linux-next.
> > 
> > That's why I explicitly proposed to take this into
> > drivers/gpu/drm/tegra for the time being, until we can be
> > reasonably sure that the API is fixed. Then I'm fine with moving it
> > wherever seems the best fit. Even then there might be the
> > occasional dependency, but they should get fewer and fewer as the
> > code matures.
> 
> It is acceptable for one maintainer to ack patches, and another
> maintainer to merge a series that touches both "their own" code and
> code owned by another tree. This should of course only be needed when
> inter-module APIs change; changes to code within a module shouldn't
> require this.

Yes, that's true. But it still makes things more complicated since each
of the maintainers will have to do extra work to test the changes.
Anyway we'll see how this plays out. The ideal case would of course be
to get the API right from the start. =)

Thierry

Thierry Reding Nov. 30, 2012, 6:53 a.m. UTC | #9

On Fri, Nov 30, 2012 at 08:54:32AM +0200, Terje Bergström wrote:
> On 29.11.2012 20:34, Stephen Warren wrote:
> > On 11/29/2012 03:21 AM, Terje Bergström wrote:
> >> True. I might also as well delete the general interrupt altogether, as
> >> we don't use it for any real purpose.
> > 
> > Do make sure the interrupts still are part of the DT binding though, so
> > that the binding fully describes the HW, and the interrupt is available
> > to retrieve if we ever do use it in the future.
> 
> Sure, I will just not use the generic irq in DT, but it won't require
> any changes in DT bindings.
> 
> > You can still create tables of clocks inside the driver and loop over
> > them. So, loop unrolling isn't related to my comments at least. It's
> > just that clk_get() shouldn't take its parameters from platform data.
> > 
> > But if these are clocks for (arbitrary) child modules (that may or may
> > not exist dynamically), why aren't the drivers for the child modules
> > managing them?
> 
> There are actually two things here that I mixed, and because of that I
> probably confused everybody else.
> 
> Let's rip out the ACM. ACM is generic to all modules, and in nvhost owns
> the clocks. That's why list of clocks and their frequency policies have
> been part of the device description in nvhost. ACM is being replaced
> with runtime PM in downstream kernel, but it still requires rigorous
> testing and analysis of power profile before we can move to it.
> 
> Then, the second thing is that nvhost_probe() has had its own loop to go
> through the clocks of host1x module. It's copy-paste of what ACM did,
> which is just bad design. That's easily replaceable with static code, as
> nvhost_probe() is just for host1x. I'll do that, and as I rip out the
> generic power management code, I'll also make 2D and host1x drivers
> enable the clocks at probe with static code.
> 
> So I think we have a solution that resonates with all proposals.

Yes, that sounds good to me.

Thierry

Terje Bergstrom Nov. 30, 2012, 6:54 a.m. UTC | #10

On 29.11.2012 20:34, Stephen Warren wrote:
> On 11/29/2012 03:21 AM, Terje Bergström wrote:
>> True. I might also as well delete the general interrupt altogether, as
>> we don't use it for any real purpose.
> 
> Do make sure the interrupts still are part of the DT binding though, so
> that the binding fully describes the HW, and the interrupt is available
> to retrieve if we ever do use it in the future.

Sure, I will just not use the generic irq in DT, but it won't require
any changes in DT bindings.

> You can still create tables of clocks inside the driver and loop over
> them. So, loop unrolling isn't related to my comments at least. It's
> just that clk_get() shouldn't take its parameters from platform data.
> 
> But if these are clocks for (arbitrary) child modules (that may or may
> not exist dynamically), why aren't the drivers for the child modules
> managing them?

There are actually two things here that I mixed, and because of that I
probably confused everybody else.

Let's rip out the ACM. ACM is generic to all modules, and in nvhost owns
the clocks. That's why list of clocks and their frequency policies have
been part of the device description in nvhost. ACM is being replaced
with runtime PM in downstream kernel, but it still requires rigorous
testing and analysis of power profile before we can move to it.

Then, the second thing is that nvhost_probe() has had its own loop to go
through the clocks of host1x module. It's copy-paste of what ACM did,
which is just bad design. That's easily replaceable with static code, as
nvhost_probe() is just for host1x. I'll do that, and as I rip out the
generic power management code, I'll also make 2D and host1x drivers
enable the clocks at probe with static code.

So I think we have a solution that resonates with all proposals.

Best regards,
Terje

Lucas Stach Nov. 30, 2012, 8:50 a.m. UTC | #11

Am Donnerstag, den 29.11.2012, 11:38 -0700 schrieb Stephen Warren:
> On 11/29/2012 04:47 AM, Thierry Reding wrote:
> > I agree. But I also fear that there will be changes eventually and 
> > having both go in via different tree requires those trees to be
> > merged in a specific order to avoid breakage should the API change.
> > This will be particularly ugly in linux-next.
> > 
> > That's why I explicitly proposed to take this into
> > drivers/gpu/drm/tegra for the time being, until we can be
> > reasonably sure that the API is fixed. Then I'm fine with moving it
> > wherever seems the best fit. Even then there might be the
> > occasional dependency, but they should get fewer and fewer as the
> > code matures.
> 
> It is acceptable for one maintainer to ack patches, and another
> maintainer to merge a series that touches both "their own" code and
> code owned by another tree. This should of course only be needed when
> inter-module APIs change; changes to code within a module shouldn't
> require this.
> 

I'm with Thierry here. I think there is a fair chance that we won't get
the API right from the start, even when trying to come up with something
that sounds sane to everyone. It's also not desirable to delay gr2d
going into mainline until we are all completely satisfied with the API.

I also fail to see how host1x module being in the DRM directory hinders
any downstream development. So I'm in favour of keeping host1x besides
the other DRM components to lower the burden for API changes and move it
out into some more generic directory, once we feel confident that the
API is reasonable stable.

Regards,
Lucas

Terje Bergstrom Nov. 30, 2012, 8:56 a.m. UTC | #12

On 29.11.2012 13:47, Thierry Reding wrote:
> On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström wrote:
>> Tegra20 and Tegra30 are compatible, but future chips are not. I was
>> hoping we would be ready in upstream kernel for future chips.
> 
> I think we should ignore that problem for now. Generally planning for
> any possible combination of incompatibilities leads to overgeneralized
> designs that require precisely these kinds of indirections.
> 
> Once some documentation for Tegra 40 materializes we can start thinking
> about how to encapsulate the incompatible code.

I think here our perspectives differ a lot. That is natural considering
the company I work for and company you work for, so let's try to sync
the perspective.

In my reality, whatever is in market is old news and I barely work on
them anymore. Upstreaming activity is the exception. 90% of my time is
spent dealing with future chips which I know cannot be handled without
this split to logical and physical driver parts.

For you, Tegra2 and Tegra3 are the reality.

If we move nvhost in upstream a bit incompatible, that's fine, like
ripping out features or adding new new stuff, like a new memory type.
All of this I can support with a good diff tool to get all the patches
flowing between upstream and downstream.

If we do fundamental changes that prevent bringing the code back to
downstream, like removing this abstraction, the whole process of
upstream and downstream converging hits a brick wall. We wouldn't have
proper continuing co-operation, but just pushing code out and being done
with it.

> I noticed that it was filled with content in one of the subsequent
> patches. Depending on how this gets merged eventually you could postpone
> adding the function until the later patch. But perhaps once the code has
> been properly reviewed we can just squash the patches again. We'll see.

Ok, thanks.

>> True. I might also as well delete the general interrupt altogether, as
>> we don't use it for any real purpose.
> 
> I think it might still be useful for diagnostics. It seems to be used
> when writes time out. That could still be helpful information when
> debugging problems.

It's actually a stale comment. The client units are not signaling
anything useful with the interrupt. There's use for it in downstream,
but that's irrelevant here.

> Making this generic for all modules may not be what we want as it
> doesn't allow devices to handle things themselves if necessary. Clock
> management is just part of the boiler plate that every driver is
> supposed to cope with. Also the number of clocks is usually not higher
> than 2 or 3, so the pain is manageable. =)
> 
> Furthermore doing this in loops may not work for all modules. Some may
> require additional delays between enabling the clocks, others may be
> able to selectively disable one clock but not the other(s).

Yes, but I'll just rip the power management code out, so we can postpone
this until we have validated and verified the runtime PM mechanism
downstream.

>> I could move this to debug.c, but it's debugging aid when a command
>> stream is misbehaving and it spews this to UART when sync point wait is
>> timing out. So not debugfs stuff.
> 
> Okay, in that case it should stay in. Perhaps convert dev_info() to
> dev_dbg(). Perhaps wrapping it in some #ifdef CONFIG_TEGRA_HOST1X_DEBUG
> guards would also be useful. Maybe not.

I could do that for upstream. In downstream it cannot depend on DEBUG
flag, as these spews are an important part of how we debug problems with
customer devices and the DEBUG flag is never on in customer builds.

> The problem is not with autogenerated files in general. The means by
> which they are generated are less important. However, autogenerated
> files often contain a lot of unneeded definitions and contain things
> such as "autogenerated - do not edit" lines.
> 
> So generally if you generate the content using some scripts to make sure
> it corresponds to what engineering gave you, that's okay as long as you
> make sure it has the correct form and doesn't contain any cruft.

I can remove the boilerplate, that's not a problem. In general, we have
tried to be very selective about what we generate, so that it matches
what we're using.

>> I like static inline because I get the benefit of compiler type
>> checking, and gcov shows me which register definitions have been used in
>> different tests.
> 
> Type checking shouldn't be necessary for simple defines. And I wasn't
> aware that you could get the Linux kernel to write out data to be fed to
> gcov.
> 
>> #defines are always messy and I pretty much hate them. But if the
>> general request is to use #define's, even though I don't agree, I can
>> accommodate. It's simple to write a sed script to do the conversion.
> 
> There are a lot of opportunities to abuse #defines but they are harmless
> for register definitions. The Linux kernel is full of them and I haven't
> yet seen any code that uses static inline functions for this purpose.

My problem is just that I know that the code generated is the same. What
we're talking about is that should we let the preprocessor or compiler
take care of this.

My take is that using preprocessor is not wise - it's the last resort if
there's no other proper way of doing things. Preprocessor requires all
sorts of extra parenthesis to protect against its deficiencies, and it
it merely a tool to do search-and-replace. Even multi-line needs special
treatment.

> What you need to consider as well is that many people that work with the
> Linux kernel expect code to be in a certain style. Register accesses of
> the form
> 
>         writel(value, base + OFFSET);
> 
> are very common and expected to look a certain way, so if you write code
> that doesn't comply with these guidelines you make it extra hard for
> people to read the code. And that'll cost extra time, which people don't
> usually have in excess.

But this has nothing to do with static inline vs. #define anymore, right?

> Maybe you can explain the usefulness of this some more. Why would it be
> easier to look at them in sysfs than in debugfs? You could be providing
> a simple list of syncpoints along with min/max, name, requested status,
> etc. in debugfs and it should be as easy to parse for both humans and
> machines as sysfs. I don't think IOCTLs would be any gain as they tend
> to have higher ABI stability requirements than debugfs (which doesn't
> have very strong requirements) or sysfs (which is often considered as a
> public ABI as well and therefore needs to be stable).

debugfs is just a debugging tool, and user space cannot rely on it. Only
developers can rely on existence of debugfs, as they have the means to
enable it.

sysfs is a place for actual APIs as you mention, and user space can rely
on them as proper APIs. That's what the values were exported for.

> I've said this before, and I think that this tries to be overly generic.
> Display controllers for instance work quite well without an attached
> nvhost_channel.

Yes, these structures aren't meant to be used by anything else than
units that are controlled by the host1x driver. DC, for example,
wouldn't have this.

Terje

Thierry Reding Nov. 30, 2012, 10:38 a.m. UTC | #13

On Fri, Nov 30, 2012 at 10:56:39AM +0200, Terje Bergström wrote:
> On 29.11.2012 13:47, Thierry Reding wrote:
> > On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström wrote:
> >> Tegra20 and Tegra30 are compatible, but future chips are not. I was
> >> hoping we would be ready in upstream kernel for future chips.
> > 
> > I think we should ignore that problem for now. Generally planning for
> > any possible combination of incompatibilities leads to overgeneralized
> > designs that require precisely these kinds of indirections.
> > 
> > Once some documentation for Tegra 40 materializes we can start thinking
> > about how to encapsulate the incompatible code.
> 
> I think here our perspectives differ a lot. That is natural considering
> the company I work for and company you work for, so let's try to sync
> the perspective.
> 
> In my reality, whatever is in market is old news and I barely work on
> them anymore. Upstreaming activity is the exception. 90% of my time is
> spent dealing with future chips which I know cannot be handled without
> this split to logical and physical driver parts.
> 
> For you, Tegra2 and Tegra3 are the reality.

To be fair, Tegra2 and Tegra3 are the reality for *everyone* *outside*
NVIDIA.

It's great that you spend most of your time working with future chips,
but unless you submit the code for inclusion or review nobody upstream
needs to be concerned about the implications. Most people don't have
time to waste so we naturally try to keep the maintenance burden to a
minimum.

The above implies that when you submit code it shouldn't contain pieces
that prepare for possible future extensions which may or may not be
submitted (the exception being if such changes are part of a series
where subsequent patches actually use them). The outcome is that the
amount of cruft in the mainline kernel is kept to a minimum. And that's
a very good thing.

> If we move nvhost in upstream a bit incompatible, that's fine, like
> ripping out features or adding new new stuff, like a new memory type.
> All of this I can support with a good diff tool to get all the patches
> flowing between upstream and downstream.
> 
> If we do fundamental changes that prevent bringing the code back to
> downstream, like removing this abstraction, the whole process of
> upstream and downstream converging hits a brick wall. We wouldn't have
> proper continuing co-operation, but just pushing code out and being done
> with it.

Generally upstream doesn't concern itself with downstream. However we
still willingly accept code that is submitted for upstream inclusion
independent of where it comes from. The only requirements are that the
code conforms to the established standards and has gone through an
appropriate amount of review. Downstream maintenance is up to you. If
you need to maintain code that doesn't meet the above requirements or
that you don't want to submit or haven't got around to yet that's your
problem.

If you're serious about wanting to derive your downstream kernel from a
mainline kernel, then the only realistic way for you to reduce your
amount of work is to push your code upstream. And typically the earlier
you do so, the better.

> >> I could move this to debug.c, but it's debugging aid when a command
> >> stream is misbehaving and it spews this to UART when sync point wait is
> >> timing out. So not debugfs stuff.
> > 
> > Okay, in that case it should stay in. Perhaps convert dev_info() to
> > dev_dbg(). Perhaps wrapping it in some #ifdef CONFIG_TEGRA_HOST1X_DEBUG
> > guards would also be useful. Maybe not.
> 
> I could do that for upstream. In downstream it cannot depend on DEBUG
> flag, as these spews are an important part of how we debug problems with
> customer devices and the DEBUG flag is never on in customer builds.

So I've just looked through these patches once more and I can't find
where this functionality is actually used. The host1x_syncpt_debug()
function is assigned to the nvhost_syncpt_ops.debug member, which in
turn is only used by nvhost_syncpt_debug(). The latter, however is
never used (not even by the debug infrastructure introduced in patch
4).

> >> I like static inline because I get the benefit of compiler type
> >> checking, and gcov shows me which register definitions have been used in
> >> different tests.
> > 
> > Type checking shouldn't be necessary for simple defines. And I wasn't
> > aware that you could get the Linux kernel to write out data to be fed to
> > gcov.
> > 
> >> #defines are always messy and I pretty much hate them. But if the
> >> general request is to use #define's, even though I don't agree, I can
> >> accommodate. It's simple to write a sed script to do the conversion.
> > 
> > There are a lot of opportunities to abuse #defines but they are harmless
> > for register definitions. The Linux kernel is full of them and I haven't
> > yet seen any code that uses static inline functions for this purpose.
> 
> My problem is just that I know that the code generated is the same. What
> we're talking about is that should we let the preprocessor or compiler
> take care of this.
> 
> My take is that using preprocessor is not wise - it's the last resort if
> there's no other proper way of doing things. Preprocessor requires all
> sorts of extra parenthesis to protect against its deficiencies, and it
> it merely a tool to do search-and-replace. Even multi-line needs special
> treatment.

Okay, so what you're saying here is that a huge number of people haven't
been wise in using the preprocessor for register definitions all these
years. That's a pretty bold statement. Now I obviously haven't looked at
every single line in the kernel, but I have never come across this usage
for static inline functions used for this. So, to be honest, I don't
think this is really up for discussion. Of course if you come up with an
example where this is done in a similar way I could be persuaded
otherwise.

> > What you need to consider as well is that many people that work with the
> > Linux kernel expect code to be in a certain style. Register accesses of
> > the form
> > 
> >         writel(value, base + OFFSET);
> > 
> > are very common and expected to look a certain way, so if you write code
> > that doesn't comply with these guidelines you make it extra hard for
> > people to read the code. And that'll cost extra time, which people don't
> > usually have in excess.
> 
> But this has nothing to do with static inline vs. #define anymore, right?

Of course it has. With the way you've chosen to define registers the
code will look like this:

	writel(value, base + offset_r())

Maybe it's just me, but when I read code like that I need additional
time to parse it as opposed to the canonical form.

> > Maybe you can explain the usefulness of this some more. Why would it be
> > easier to look at them in sysfs than in debugfs? You could be providing
> > a simple list of syncpoints along with min/max, name, requested status,
> > etc. in debugfs and it should be as easy to parse for both humans and
> > machines as sysfs. I don't think IOCTLs would be any gain as they tend
> > to have higher ABI stability requirements than debugfs (which doesn't
> > have very strong requirements) or sysfs (which is often considered as a
> > public ABI as well and therefore needs to be stable).
> 
> debugfs is just a debugging tool, and user space cannot rely on it. Only
> developers can rely on existence of debugfs, as they have the means to
> enable it.
> 
> sysfs is a place for actual APIs as you mention, and user space can rely
> on them as proper APIs. That's what the values were exported for.

But I don't see how that's relevant here. Let me quote what you said
originally:

> This is actually the only interface to read the max value to user space,
> which can be useful for doing some comparisons that take wrapping into
> account. But we could just add IOCTLs and remove the sysfs entries.

To me that sounded like it was only used for debugging purposes. If you
actually need to access this from a userspace driver then, as opposed to
what I said earlier, this should be handled by some IOCTL.

Thierry

Terje Bergstrom Dec. 1, 2012, 11:31 a.m. UTC | #14

On 30.11.2012 12:38, Thierry Reding wrote:
> * PGP Signed by an unknown key
> The above implies that when you submit code it shouldn't contain pieces
> that prepare for possible future extensions which may or may not be
> submitted (the exception being if such changes are part of a series
> where subsequent patches actually use them). The outcome is that the
> amount of cruft in the mainline kernel is kept to a minimum. And that's
> a very good thing.

We're now talking about actually a separation of logical and physical
driver. I can't see why that's a bad thing. Especially considering that
it's standard practice in well written drivers. Let's try to find a
technical clean solution instead of debating politics. The latter should
never be part of Linux kernel reviews.

>> I could do that for upstream. In downstream it cannot depend on DEBUG
>> flag, as these spews are an important part of how we debug problems with
>> customer devices and the DEBUG flag is never on in customer builds.
> 
> So I've just looked through these patches once more and I can't find
> where this functionality is actually used. The host1x_syncpt_debug()
> function is assigned to the nvhost_syncpt_ops.debug member, which in
> turn is only used by nvhost_syncpt_debug(). The latter, however is
> never used (not even by the debug infrastructure introduced in patch
> 4).

I have accidentally used the syncpt_op().debug() version directly. I'll
fix that.

> Okay, so what you're sayingCan here is that a huge number of people haven't
> been wise in using the preprocessor for register definitions all these
> years. That's a pretty bold statement. Now I obviously haven't looked at
> every single line in the kernel, but I have never come across this usage
> for static inline functions used for this. So, to be honest, I don't
> think this is really up for discussion. Of course if you come up with an
> example where this is done in a similar way I could be persuaded
> otherwise.

We must've talked about a bit different things. For pure register defs,
I can accommodate changing to #defines. We'd lose the code coverage
analysis, though, but if the parentheses are a make-or-break question to
upstreaming, I can change.

I was thinking of definitions like this:

static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v)
{
	return (v & 0x1ff) << 0;
}

versus

#define host1x_sync_cfpeek_ctrl_cfpeek_addr_f(v) ((v) >> 16) & 0x3ff

Both of these produce the same machine code and have same usage, but the
latter has type checking and code coverage analysis and the former is
(in my eyes) clearer. In both of these cases the usage is like this:

writel(host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1)
		| host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid)
		| host1x_sync_cfpeek_ctrl_cfpeek_addr_f(rd_ptr),
	m->sync_aperture + host1x_sync_cfpeek_ctrl_r());

> But I don't see how that's relevant here. Let me quote what you said
> originally:
> 
>> This is actually the only interface to read the max value to user space,
>> which can be useful for doing some comparisons that take wrapping into
>> account. But we could just add IOCTLs and remove the sysfs entries.
> 
> To me that sounded like it was only used for debugging purposes. If you
> actually need to access this from a userspace driver then, as opposed to
> what I said earlier, this should be handled by some IOCTL.

There's a use for production code to know both the max and min, but I
think we can just scope that use out from this patch sest.

User space can use these two for checking if one of their fences has
already passed by comparing if the current value is between min and
fence, taking wrapping into account. In these cases user space can f.ex.
leave a host1x wait out from a command stream.

Terje

Terje Bergstrom Dec. 1, 2012, 11:44 a.m. UTC | #15

On 30.11.2012 10:50, Lucas Stach wrote:
> I'm with Thierry here. I think there is a fair chance that we won't get
> the API right from the start, even when trying to come up with something
> that sounds sane to everyone. It's also not desirable to delay gr2d
> going into mainline until we are all completely satisfied with the API.
> 
> I also fail to see how host1x module being in the DRM directory hinders
> any downstream development. So I'm in favour of keeping host1x besides
> the other DRM components to lower the burden for API changes and move it
> out into some more generic directory, once we feel confident that the
> API is reasonable stable.

host1x module being in DRM directory hinders using nvhost from anywhere
outside DRM in both upstream and downstream. I also don't like first
putting the driver in one place, and then moving it with a huge commit
to another place. We'd just postpone exactly the problems that were
indicated earlier: we'd need to synchronize two trees to remove code in
one and add in another at the same time so that there wouldn't be
conflicting host1x drivers. I'd rather just add it in final place once,
and be done with it.

But if it's a make-it-or-brake-it for upstreaming, I can move it to be a
subdirectory under drivers/gpu/drm/tegra. Would this mean that we'd
modify the MAINTAINER's file so that the tegradrm entry excludes host1x
sub-directory, and I'd add another one which included only the host1x
sub-directory? The host1x part would be Supported, whereas rest of
tegradrm is Maintained.

Best regards,
Terje

Daniel Vetter Dec. 1, 2012, 1:42 p.m. UTC | #16

On Sat, Dec 1, 2012 at 12:31 PM, Terje Bergström <tbergstrom@nvidia.com> wrote:
> We must've talked about a bit different things. For pure register defs,
> I can accommodate changing to #defines. We'd lose the code coverage
> analysis, though, but if the parentheses are a make-or-break question to
> upstreaming, I can change.

Out of sheer curiosity: What are you using the coverage data of these
register definitions for? When I looked into coverage analysis the
resulting data seemed rather useless to me, since the important thing
is how well we cover the entire dynamic state space of the hw+sw (e.g.
crap left behind by the bios ...) and coverage seemed to be a poor
proxy for that. Hence why I wonder what you're doing with this data
...
-Daniel

Thierry Reding Dec. 1, 2012, 2:58 p.m. UTC | #17

On Sat, Dec 01, 2012 at 01:31:02PM +0200, Terje Bergström wrote:
> On 30.11.2012 12:38, Thierry Reding wrote:
> > * PGP Signed by an unknown key
> > The above implies that when you submit code it shouldn't contain pieces
> > that prepare for possible future extensions which may or may not be
> > submitted (the exception being if such changes are part of a series
> > where subsequent patches actually use them). The outcome is that the
> > amount of cruft in the mainline kernel is kept to a minimum. And that's
> > a very good thing.
> 
> We're now talking about actually a separation of logical and physical
> driver. I can't see why that's a bad thing. Especially considering that
> it's standard practice in well written drivers. Let's try to find a
> technical clean solution instead of debating politics. The latter should
> never be part of Linux kernel reviews.

I don't know where you see politics in what I said. All I'm saying is
that we shouldn't be making things needlessly complex. In my experience
the technically cleanest solution is usually the one with the least
complexity.

> > Okay, so what you're sayingCan here is that a huge number of people haven't
> > been wise in using the preprocessor for register definitions all these
> > years. That's a pretty bold statement. Now I obviously haven't looked at
> > every single line in the kernel, but I have never come across this usage
> > for static inline functions used for this. So, to be honest, I don't
> > think this is really up for discussion. Of course if you come up with an
> > example where this is done in a similar way I could be persuaded
> > otherwise.
> 
> We must've talked about a bit different things. For pure register defs,
> I can accommodate changing to #defines. We'd lose the code coverage
> analysis, though, but if the parentheses are a make-or-break question to
> upstreaming, I can change.
> 
> I was thinking of definitions like this:
> 
> static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v)
> {
> 	return (v & 0x1ff) << 0;
> }
> 
> versus
> 
> #define host1x_sync_cfpeek_ctrl_cfpeek_addr_f(v) ((v) >> 16) & 0x3ff
> 
> Both of these produce the same machine code and have same usage, but the
> latter has type checking and code coverage analysis and the former is
> (in my eyes) clearer. In both of these cases the usage is like this:
> 
> writel(host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1)
> 		| host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid)
> 		| host1x_sync_cfpeek_ctrl_cfpeek_addr_f(rd_ptr),
> 	m->sync_aperture + host1x_sync_cfpeek_ctrl_r());

Again there's no precedent for doing this with static inline functions.
You can do the same with macros. Type checking isn't an issue in these
cases since we're talking about bitfields for which no proper type
exists.

Two other things about the examples above: the definitions should be all
caps and it would be nice if they could be made a bit shorter.

> > But I don't see how that's relevant here. Let me quote what you said
> > originally:
> > 
> >> This is actually the only interface to read the max value to user space,
> >> which can be useful for doing some comparisons that take wrapping into
> >> account. But we could just add IOCTLs and remove the sysfs entries.
> > 
> > To me that sounded like it was only used for debugging purposes. If you
> > actually need to access this from a userspace driver then, as opposed to
> > what I said earlier, this should be handled by some IOCTL.
> 
> There's a use for production code to know both the max and min, but I
> think we can just scope that use out from this patch sest.
> 
> User space can use these two for checking if one of their fences has
> already passed by comparing if the current value is between min and
> fence, taking wrapping into account. In these cases user space can f.ex.
> leave a host1x wait out from a command stream.

But you already have extra code in the kernel to patch out expired sync-
points. Is it really worth the added effort to burden userspace with
this? If so I still think some kind of generic IOCTL to retrieve
information about a syncpoint would be better than a sysfs interface.

Thierry

Thierry Reding Dec. 1, 2012, 3:10 p.m. UTC | #18

On Sat, Dec 01, 2012 at 01:44:41PM +0200, Terje Bergström wrote:
> On 30.11.2012 10:50, Lucas Stach wrote:
> > I'm with Thierry here. I think there is a fair chance that we won't get
> > the API right from the start, even when trying to come up with something
> > that sounds sane to everyone. It's also not desirable to delay gr2d
> > going into mainline until we are all completely satisfied with the API.
> > 
> > I also fail to see how host1x module being in the DRM directory hinders
> > any downstream development. So I'm in favour of keeping host1x besides
> > the other DRM components to lower the burden for API changes and move it
> > out into some more generic directory, once we feel confident that the
> > API is reasonable stable.
> 
> host1x module being in DRM directory hinders using nvhost from anywhere
> outside DRM in both upstream and downstream.

That's not true. Nothing keeps the rest of the kernel from using an API
exported by the tegra-drm driver.

> I also don't like first putting the driver in one place, and then
> moving it with a huge commit to another place.

Hehe, you're doing exactly that in this patch series. =)

> We'd just postpone exactly the problems that were indicated earlier:
> we'd need to synchronize two trees to remove code in one and add in
> another at the same time so that there wouldn't be conflicting host1x
> drivers. I'd rather just add it in final place once, and be done with
> it.

Yes, there would be a certain amount of synchronization needed, but as
Stephen correctly pointed out we could do that move through one tree
with the Acked-by of the other maintainer. The point is that we need to
do this once instead of everytime the API changes.

> But if it's a make-it-or-brake-it for upstreaming, I can move it to be a
> subdirectory under drivers/gpu/drm/tegra. Would this mean that we'd
> modify the MAINTAINER's file so that the tegradrm entry excludes host1x
> sub-directory, and I'd add another one which included only the host1x
> sub-directory? The host1x part would be Supported, whereas rest of
> tegradrm is Maintained.

An entry for drivers/gpu/drm/tegra/host1x would override an entry for
drivers/gpu/drm/tegra so no need to exclude it. That said, there's no
way to exclude an subdirectory in MAINTAINERS that I know of.

My main point for keeping host1x within tegra-drm for now was that it
could possibly help speed up the inclusion of the host1x code. Seeing
that there's still a substantial amount of work to be done and a need
for discussion I'm not sure if rushing this is the best way. In that
case there may be justification for putting it in a separate location
from the start.

Thierry

Terje Bergstrom Dec. 1, 2012, 4:22 p.m. UTC | #19

On 01.12.2012 15:42, Daniel Vetter wrote:
> Out of sheer curiosity: What are you using the coverage data of these
> register definitions for? When I looked into coverage analysis the
> resulting data seemed rather useless to me, since the important thing
> is how well we cover the entire dynamic state space of the hw+sw (e.g.
> crap left behind by the bios ...) and coverage seemed to be a poor
> proxy for that. Hence why I wonder what you're doing with this data

Yes, it's a poor proxy. But still, I use it to determine how big
portions of hardware address space and fields I'm touching when running
host1x tests. It's interesting data for planning tests, but not much more.

Best regards,
Terje

Terje Bergstrom Dec. 1, 2012, 4:55 p.m. UTC | #20

On 01.12.2012 17:10, Thierry Reding wrote:
> On Sat, Dec 01, 2012 at 01:44:41PM +0200, Terje Bergström wrote:
>> host1x module being in DRM directory hinders using nvhost from anywhere
>> outside DRM in both upstream and downstream.
> 
> That's not true. Nothing keeps the rest of the kernel from using an API
> exported by the tegra-drm driver.

Right, it's just a directory. I was actually thinking that it'd be weird
if a V4L2 driver would use something from inside drivers/gpu/drm/tegra
(V4L use DRM? Oh nooooo!).

Shoot the idea down if it's crazy, but please think about it first. :-)

I started thinking about this and we are constrained by the Linux kernel
subsystems that have a complete different architecture than hardware.
This leads to awkward design as DRM design as it conflicts with the way
hardware works.

Placing host1x driver in one place, DRM driver in another and XYZ driver
in yet another is not ideal either. We're exposing a public API which
needs to be strictly maintained, because we maintain drivers in
different trees, but then again, the list of users is very static and
well-defined, so public API is an overshoot.

How about if we look at this from the hardware architecture point of
view? You mentioned that perhaps drivers/bus/host1x would be the best
place for host1x driver.

What if we put also all host1x client modules under that same directory?
drivers/bus/host1x/drm would be for DRM interface, and all other host1x
client module drivers could be placed similarly. This way we could keep
the host1x API private to host1x and the client module drivers, and it's
easy to understand how host1x is used by just following the directory
structure.

Naturally, we could also think if we want to have sub-components per
host1x client (dc, 2d, etc) and a drm sub-component that implements the
DRM interface, and a V4L2 sub-component that implements V4L2 interface
(when/if I can convince people that camera should go upstream).

>> I also don't like first putting the driver in one place, and then
>> moving it with a huge commit to another place.
> 
> Hehe, you're doing exactly that in this patch series. =)

True, I guess it's just a matter of determining what's the best time.

> Yes, there would be a certain amount of synchronization needed, but as
> Stephen correctly pointed out we could do that move through one tree
> with the Acked-by of the other maintainer. The point is that we need to
> do this once instead of everytime the API changes.

Yep, inter-tree synchronization is possible, so not a show stopper.

> An entry for drivers/gpu/drm/tegra/host1x would override an entry for
> drivers/gpu/drm/tegra so no need to exclude it. That said, there's no
> way to exclude an subdirectory in MAINTAINERS that I know of.

I saw tag X: in MAINTAINERS file, so that could be used. There's
documentation for it, but also some examples like:

IBM Power Virtual SCSI/FC Device Drivers
M:	Robert Jennings <rcj@linux.vnet.ibm.com>
L:	linux-scsi@vger.kernel.org
S:	Supported
F:	drivers/scsi/ibmvscsi/
X:	drivers/scsi/ibmvscsi/ibmvstgt.c

> My main point for keeping host1x within tegra-drm for now was that it
> could possibly help speed up the inclusion of the host1x code. Seeing
> that there's still a substantial amount of work to be done and a need
> for discussion I'm not sure if rushing this is the best way. In that
> case there may be justification for putting it in a separate location
> from the start.

I'm not in a hurry, so let's try to figure the best design first.
Biggest architectural unsolved problem is the memory management and
relationship between tegradrm and host1x driver. What Lucas proposed
about memory management makes sense, but it'll take a while to implement it.

The rest of the unsolved questions are more about differences in
opinion, and solvable.

Terje

Terje Bergstrom Dec. 1, 2012, 5:13 p.m. UTC | #21

On 01.12.2012 16:58, Thierry Reding wrote:
> I don't know where you see politics in what I said. All I'm saying is
> that we shouldn't be making things needlessly complex. In my experience
> the technically cleanest solution is usually the one with the least
> complexity.

Let me come up with a proposal and let's then see where to go next.

> But you already have extra code in the kernel to patch out expired sync-
> points. Is it really worth the added effort to burden userspace with
> this? If so I still think some kind of generic IOCTL to retrieve
> information about a syncpoint would be better than a sysfs interface.

That's exactly why I mentioned that it's not useful to upstream. There
are some cases where user space might want to check if a fence has
passed without waiting for it, but that's marginal and could be handled
even with waits with zero timeout.

Terje

Lucas Stach Dec. 1, 2012, 5:34 p.m. UTC | #22

Am Samstag, den 01.12.2012, 18:55 +0200 schrieb Terje Bergström:
> On 01.12.2012 17:10, Thierry Reding wrote:
> > On Sat, Dec 01, 2012 at 01:44:41PM +0200, Terje Bergström wrote:
> >> host1x module being in DRM directory hinders using nvhost from anywhere
> >> outside DRM in both upstream and downstream.
> > 
> > That's not true. Nothing keeps the rest of the kernel from using an API
> > exported by the tegra-drm driver.
> 
> Right, it's just a directory. I was actually thinking that it'd be weird
> if a V4L2 driver would use something from inside drivers/gpu/drm/tegra
> (V4L use DRM? Oh nooooo!).
> 
Yes it _is_ weird to have V4L using something which resides inside DRM,
but see below.

> Shoot the idea down if it's crazy, but please think about it first. :-)
> 
> I started thinking about this and we are constrained by the Linux kernel
> subsystems that have a complete different architecture than hardware.
> This leads to awkward design as DRM design as it conflicts with the way
> hardware works.
> 
> Placing host1x driver in one place, DRM driver in another and XYZ driver
> in yet another is not ideal either. We're exposing a public API which
> needs to be strictly maintained, because we maintain drivers in
> different trees, but then again, the list of users is very static and
> well-defined, so public API is an overshoot.

> How about if we look at this from the hardware architecture point of
> view? You mentioned that perhaps drivers/bus/host1x would be the best
> place for host1x driver.
> 
> What if we put also all host1x client modules under that same directory?
> drivers/bus/host1x/drm would be for DRM interface, and all other host1x
> client module drivers could be placed similarly. This way we could keep
> the host1x API private to host1x and the client module drivers, and it's
> easy to understand how host1x is used by just following the directory
> structure.
> 
This would certainly make life easier, but personally I don't think it's
the right thing to do. The separation of the Linux kernel into different
subsystems was done for a reason and just because the specific hardware
at hands happens to work a bit different is no valid reason to break
with the standard rules of the kernel.

So I think there is no way around handling the different drivers that
use host1x in different trees. For the time being there is _only_
tegra-drm using host1x in the upstream kernel. We have to make sure to
come up with some API which is reasonably stable, so we don't run into
big problems later. That's why I'm really in favour to keep host1x and
tegra-drm side by side in the current upstream, to make sure we can
change the API without jumping through too much hoops.

Your downstream V4L would have to use host1x from the DRM directory, but
really: is your downstream such a nice, clean codebase that you are not
able to cope with the slight ugliness of this solution?

> Naturally, we could also think if we want to have sub-components per
> host1x client (dc, 2d, etc) and a drm sub-component that implements the
> DRM interface, and a V4L2 sub-component that implements V4L2 interface
> (when/if I can convince people that camera should go upstream).
> 
To me this sound as if V4L upstream support is still a fair time away.
IMHO the right time to move out host1x is exactly the point when a
second user starts appearing upstream. This will give us some time to
fiddle with the API until we have to commit to it as being stable.

> >> I also don't like first putting the driver in one place, and then
> >> moving it with a huge commit to another place.
> > 
> > Hehe, you're doing exactly that in this patch series. =)
> 
> True, I guess it's just a matter of determining what's the best time.
> 
See above.

[...]
> I'm not in a hurry, so let's try to figure the best design first.
> Biggest architectural unsolved problem is the memory management and
> relationship between tegradrm and host1x driver. What Lucas proposed
> about memory management makes sense, but it'll take a while to implement it.

Please make sure to remove any unnecessary cruft from host1x in the
process and don't try to make too big of a step at once. We only need
one type of memory within host1x: native host1x objects, no need to plan
for support of anything else. Also taking over ownership of the IOMMU
address space might take some more work in the IOMMU API. We can leave
this out completely for a start. Both Tegra 2 and 3 should be able to
work with CMA backed objects just fine.

Regards,
Lucas

Terje Bergstrom Dec. 1, 2012, 7:29 p.m. UTC | #23

On 01.12.2012 19:34, Lucas Stach wrote:
> This would certainly make life easier, but personally I don't think it's
> the right thing to do. The separation of the Linux kernel into different
> subsystems was done for a reason and just because the specific hardware
> at hands happens to work a bit different is no valid reason to break
> with the standard rules of the kernel.
> 
> So I think there is no way around handling the different drivers that
> use host1x in different trees. For the time being there is _only_
> tegra-drm using host1x in the upstream kernel. We have to make sure to
> come up with some API which is reasonably stable, so we don't run into
> big problems later. That's why I'm really in favour to keep host1x and
> tegra-drm side by side in the current upstream, to make sure we can
> change the API without jumping through too much hoops.
> 
> Your downstream V4L would have to use host1x from the DRM directory, but
> really: is your downstream such a nice, clean codebase that you are not
> able to cope with the slight ugliness of this solution?

Ok, can do. I'll move the code base to drivers/gpu/drm/tegra/host1x. For
downstream, the host1x driver implements all user space APIs (no drm, no
v4l, etc) so the directory is of no consequence. If we immersed host1x
driver totally with tegra-drm, that'd be a problem, but if I can keep a
separation, that's fine.

> Please make sure to remove any unnecessary cruft from host1x in the
> process and don't try to make too big of a step at once. We only need
> one type of memory within host1x: native host1x objects, no need to plan
> for support of anything else. Also taking over ownership of the IOMMU
> address space might take some more work in the IOMMU API. We can leave
> this out completely for a start. Both Tegra 2 and 3 should be able to
> work with CMA backed objects just fine.

Ok, that simplifies the process. I'll just implement firewall and copy
the strema over to kernel space unconditionally.

Terje

Dave Airlie Dec. 1, 2012, 9:42 p.m. UTC | #24

Guys I think you guys might be overthniking things here.

I know you have some sort of upstream/downstream split, but really in
the upstream kernel, we don't care about that, so don't make it our
problem.

There is no need for any sort of stable API between host1x and the sub
drivers, we change APIs in the kernel the whole time it isn't a
problem.

If you need to change the API, submit a single patch changing it
across all the drivers in the tree, collecting Acks or not as needed.
We do this the whole time, I've never had or seen a problem with it.

We don't do separate subsystems APIs set in stone bullshit, and all
subsystem maintainers are used to dealing with these sort of issues.
You get an ack from one maintainer and the other one sticks it in his
tree with a note to Linus.

You can put the code where you want, maybe just under drivers/gpu
instead of drivers/video or drivers/gpu/drm, just make sure you have a
path for it into the kernel.

And I have an non-upstream precedent for v4l sitting on drm, some
radeon GPUs have capture tuners, and the only way to implement that
would be to stick a v4l driver in the radeon drm driver. Not a
problem, just never finished writing the code.

Dave.

Thierry Reding Dec. 1, 2012, 10:39 p.m. UTC | #25

On Sun, Dec 02, 2012 at 07:42:06AM +1000, Dave Airlie wrote:
> Guys I think you guys might be overthniking things here.
> 
> I know you have some sort of upstream/downstream split, but really in
> the upstream kernel, we don't care about that, so don't make it our
> problem.
> 
> There is no need for any sort of stable API between host1x and the sub
> drivers, we change APIs in the kernel the whole time it isn't a
> problem.

Point taken. I was primarily concerned about needless churn during early
development. But given the latest discussions it has become clear that
there's no need to rush things and therefore we should be able to
resolve any potential issues that could result in churn before the first
patches are merged.

> You can put the code where you want, maybe just under drivers/gpu
> instead of drivers/video or drivers/gpu/drm, just make sure you have a
> path for it into the kernel.

drivers/gpu/host1x sounds like a good location to me. Does that still go
in via your tree?

Thierry

Terje Bergstrom Dec. 2, 2012, 11:24 a.m. UTC | #26

On 01.12.2012 23:42, Dave Airlie wrote:
> Guys I think you guys might be overthniking things here.
> 
> I know you have some sort of upstream/downstream split, but really in
> the upstream kernel, we don't care about that, so don't make it our
> problem.

I am not trying to make anything your problem. Most of the issues we
have already worked out with a good solution that all active
participants have agreed with. We have only a couple of disagreements
with Thierry.

My goal is to get a good open source co-operation and trying to prevent
a code fork while still maintaining good design. That way everybody
wins. The way to do that is to base our BSP on upstream kernel.

I'm not trying to here throw code over the fence and flee. This is a
genuine attempt to work together. I want to prevent the "we" (kernel
community excluding NVIDIA) and "you" (NVIDIA) that a split code base
would cause in the long run. I'd like to just talk about "we" including
NVIDIA.

> There is no need for any sort of stable API between host1x and the sub
> drivers, we change APIs in the kernel the whole time it isn't a
> problem.
> 
> If you need to change the API, submit a single patch changing it
> across all the drivers in the tree, collecting Acks or not as needed.
> We do this the whole time, I've never had or seen a problem with it.
> 
> We don't do separate subsystems APIs set in stone bullshit, and all
> subsystem maintainers are used to dealing with these sort of issues.
> You get an ack from one maintainer and the other one sticks it in his
> tree with a note to Linus.
> 
> You can put the code where you want, maybe just under drivers/gpu
> instead of drivers/video or drivers/gpu/drm, just make sure you have a
> path for it into the kernel.

Follows exactly my thinking, as the location of host1x driver has no
practical consequence to me.

Thierry proposed drivers/gpu/host1x. I'd like to see a couple of
comments on that proposal, and if it sticks, follow that.

Thierry, did you mean that host1x driver would be in drivers/gpu/host1x,
and tegradrm in drivers/gpu/drm/tegra, or would we put both in same
directory?

> And I have an non-upstream precedent for v4l sitting on drm, some
> radeon GPUs have capture tuners, and the only way to implement that
> would be to stick a v4l driver in the radeon drm driver. Not a
> problem, just never finished writing the code.

Yes, I just mentioned that as awkward, but I have no problem with any path.

Terje

Thierry Reding Dec. 2, 2012, 8:55 p.m. UTC | #27

On Sun, Dec 02, 2012 at 01:24:13PM +0200, Terje Bergström wrote:
> On 01.12.2012 23:42, Dave Airlie wrote:
> > Guys I think you guys might be overthniking things here.
> > 
> > I know you have some sort of upstream/downstream split, but really in
> > the upstream kernel, we don't care about that, so don't make it our
> > problem.
> 
> I am not trying to make anything your problem. Most of the issues we
> have already worked out with a good solution that all active
> participants have agreed with. We have only a couple of disagreements
> with Thierry.
> 
> My goal is to get a good open source co-operation and trying to prevent
> a code fork while still maintaining good design. That way everybody
> wins. The way to do that is to base our BSP on upstream kernel.

Yes, that's exactly what you should be doing.

> I'm not trying to here throw code over the fence and flee. This is a
> genuine attempt to work together. I want to prevent the "we" (kernel
> community excluding NVIDIA) and "you" (NVIDIA) that a split code base
> would cause in the long run. I'd like to just talk about "we" including
> NVIDIA.

FWIW I'm convinced that you're genuinely trying to make this work and
nobody welcomes this more than me. However it is only natural if you
dump such a large body of code on the community that people will
disagree with some of the design decisions.

So when I comment on the design or patches in general, it is not my
intention to exclude you or NVIDIA in any way. All I'm trying to do is
spot problematic or unclear parts that will make working with the code
any more difficult than it has to be.

> > There is no need for any sort of stable API between host1x and the sub
> > drivers, we change APIs in the kernel the whole time it isn't a
> > problem.
> > 
> > If you need to change the API, submit a single patch changing it
> > across all the drivers in the tree, collecting Acks or not as needed.
> > We do this the whole time, I've never had or seen a problem with it.
> > 
> > We don't do separate subsystems APIs set in stone bullshit, and all
> > subsystem maintainers are used to dealing with these sort of issues.
> > You get an ack from one maintainer and the other one sticks it in his
> > tree with a note to Linus.
> > 
> > You can put the code where you want, maybe just under drivers/gpu
> > instead of drivers/video or drivers/gpu/drm, just make sure you have a
> > path for it into the kernel.
> 
> Follows exactly my thinking, as the location of host1x driver has no
> practical consequence to me.
> 
> Thierry proposed drivers/gpu/host1x. I'd like to see a couple of
> comments on that proposal, and if it sticks, follow that.
> 
> Thierry, did you mean that host1x driver would be in drivers/gpu/host1x,
> and tegradrm in drivers/gpu/drm/tegra, or would we put both in same
> directory?

Since tegra-drm is a DRM driver it should stay in drivers/gpu/drm. I can
also live with the host1x driver staying in drivers/video, but I don't
think it's the proper location and drivers/gpu/host1x seems like a much
better fit.

Thierry

Terje Bergstrom Dec. 3, 2012, 6:26 a.m. UTC | #28

On 02.12.2012 22:55, Thierry Reding wrote:
> FWIW I'm convinced that you're genuinely trying to make this work and
> nobody welcomes this more than me. However it is only natural if you
> dump such a large body of code on the community that people will
> disagree with some of the design decisions.
> 
> So when I comment on the design or patches in general, it is not my
> intention to exclude you or NVIDIA in any way. All I'm trying to do is
> spot problematic or unclear parts that will make working with the code
> any more difficult than it has to be.

Thanks, I know it'a a large dump and you've made great comments about
the code, and hit most of the sore spots I've had with the driver, too.
I appreciate your effort - this process is making the driver better.

It's good to hear that our goals are aligned. So now that we got that
out of the system, let's get back to business. :-)

> Since tegra-drm is a DRM driver it should stay in drivers/gpu/drm. I can
> also live with the host1x driver staying in drivers/video, but I don't
> think it's the proper location and drivers/gpu/host1x seems like a much
> better fit.

That sounds like a plan to me.

Terje

Stephen Warren Dec. 3, 2012, 7:20 p.m. UTC | #29

On 11/30/2012 03:38 AM, Thierry Reding wrote:
> On Fri, Nov 30, 2012 at 10:56:39AM +0200, Terje Bergström wrote:
>> On 29.11.2012 13:47, Thierry Reding wrote:
>>> On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström
>>> wrote:
>>>> Tegra20 and Tegra30 are compatible, but future chips are not.
>>>> I was hoping we would be ready in upstream kernel for future
>>>> chips.
>>> 
>>> I think we should ignore that problem for now. Generally
>>> planning for any possible combination of incompatibilities
>>> leads to overgeneralized designs that require precisely these
>>> kinds of indirections.
>>> 
>>> Once some documentation for Tegra 40 materializes we can start
>>> thinking about how to encapsulate the incompatible code.
>> 
>> I think here our perspectives differ a lot. That is natural
>> considering the company I work for and company you work for, so
>> let's try to sync the perspective.
>> 
>> In my reality, whatever is in market is old news and I barely
>> work on them anymore. Upstreaming activity is the exception. 90%
>> of my time is spent dealing with future chips which I know cannot
>> be handled without this split to logical and physical driver
>> parts.
>> 
>> For you, Tegra2 and Tegra3 are the reality.
> 
> To be fair, Tegra2 and Tegra3 are the reality for *everyone*
> *outside* NVIDIA.
> 
> It's great that you spend most of your time working with future
> chips, but unless you submit the code for inclusion or review
> nobody upstream needs to be concerned about the implications. Most
> people don't have time to waste so we naturally try to keep the
> maintenance burden to a minimum.
> 
> The above implies that when you submit code it shouldn't contain
> pieces that prepare for possible future extensions which may or may
> not be submitted (the exception being if such changes are part of a
> series where subsequent patches actually use them). The outcome is
> that the amount of cruft in the mainline kernel is kept to a
> minimum. And that's a very good thing.

I think there's room for letting Terje's complete knowledge of future
chips guide the design of the current code that's sent upstream.
Certainly we shouldn't add a ton of unnecessary abstraction layers
right now that aren't needed for Tegra20/30, but if there's some
decision that doesn't affect the bloat, opaqueness, ... of the current
code but one choice is better for future development without serious
negatives for the current code, it's pretty reasonable to make that
decision rather than the other.

(That all said, I haven't really followed the details of this
particular point, so I can't say how my comment applies to any
decisions being made right now - just that we shouldn't blanket reject
future knowledge when making decisions)

After all, making the right decision now will reduce the number/size
of patches later, and hence reduce code churn and reviewer load.

Stephen Warren Dec. 3, 2012, 7:23 p.m. UTC | #30

On 12/01/2012 07:58 AM, Thierry Reding wrote:
> On Sat, Dec 01, 2012 at 01:31:02PM +0200, Terje Bergström wrote:
...
>> I was thinking of definitions like this:
>> 
>> static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v) { 
>> return (v & 0x1ff) << 0; }
>> 
>> versus
>> 
>> #define host1x_sync_cfpeek_ctrl_cfpeek_addr_f(v) ((v) >> 16) &
>> 0x3ff
>> 
>> Both of these produce the same machine code and have same usage,
>> but the latter has type checking and code coverage analysis and
>> the former is (in my eyes) clearer. In both of these cases the
>> usage is like this:
>> 
>> writel(host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1) |
>> host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid) |
>> host1x_sync_cfpeek_ctrl_cfpeek_addr_f(rd_ptr), m->sync_aperture +
>> host1x_sync_cfpeek_ctrl_r());
> 
> Again there's no precedent for doing this with static inline
> functions. You can do the same with macros. Type checking isn't an
> issue in these cases since we're talking about bitfields for which
> no proper type exists.

I suspect the inline functions could encode signed-vs-unsigned fields,
perhaps catch u8 variables when they should have been u32, etc.?

Thierry Reding Dec. 3, 2012, 9:03 p.m. UTC | #31

On Mon, Dec 03, 2012 at 12:20:30PM -0700, Stephen Warren wrote:
> On 11/30/2012 03:38 AM, Thierry Reding wrote:
> > On Fri, Nov 30, 2012 at 10:56:39AM +0200, Terje Bergström wrote:
> >> On 29.11.2012 13:47, Thierry Reding wrote:
> >>> On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström
> >>> wrote:
> >>>> Tegra20 and Tegra30 are compatible, but future chips are not.
> >>>> I was hoping we would be ready in upstream kernel for future
> >>>> chips.
> >>> 
> >>> I think we should ignore that problem for now. Generally
> >>> planning for any possible combination of incompatibilities
> >>> leads to overgeneralized designs that require precisely these
> >>> kinds of indirections.
> >>> 
> >>> Once some documentation for Tegra 40 materializes we can start
> >>> thinking about how to encapsulate the incompatible code.
> >> 
> >> I think here our perspectives differ a lot. That is natural
> >> considering the company I work for and company you work for, so
> >> let's try to sync the perspective.
> >> 
> >> In my reality, whatever is in market is old news and I barely
> >> work on them anymore. Upstreaming activity is the exception. 90%
> >> of my time is spent dealing with future chips which I know cannot
> >> be handled without this split to logical and physical driver
> >> parts.
> >> 
> >> For you, Tegra2 and Tegra3 are the reality.
> > 
> > To be fair, Tegra2 and Tegra3 are the reality for *everyone*
> > *outside* NVIDIA.
> > 
> > It's great that you spend most of your time working with future
> > chips, but unless you submit the code for inclusion or review
> > nobody upstream needs to be concerned about the implications. Most
> > people don't have time to waste so we naturally try to keep the
> > maintenance burden to a minimum.
> > 
> > The above implies that when you submit code it shouldn't contain
> > pieces that prepare for possible future extensions which may or may
> > not be submitted (the exception being if such changes are part of a
> > series where subsequent patches actually use them). The outcome is
> > that the amount of cruft in the mainline kernel is kept to a
> > minimum. And that's a very good thing.
> 
> I think there's room for letting Terje's complete knowledge of future
> chips guide the design of the current code that's sent upstream.
> Certainly we shouldn't add a ton of unnecessary abstraction layers
> right now that aren't needed for Tegra20/30, but if there's some
> decision that doesn't affect the bloat, opaqueness, ... of the current
> code but one choice is better for future development without serious
> negatives for the current code, it's pretty reasonable to make that
> decision rather than the other.

The original point was that the current design stashes every function of
host1x into an ops structure and you have to go through those ops to get
at the functionality. I can understand the need to add an ops structure
to cope with incompatibilities between versions, but as you say there
should to be a reason for them being introduced. If such reasons exists,
then I think they at least warrant a comment somewhere.

Furthermore this is usually best handled by wrapping the ops accesses in
a public API, so that the ops structure can be hidden within the driver.
For example, submitting a job to a channel should have a public API such
as:

	int host1x_channel_submit(struct host1x_channel *channel,
				  struct host1x_job *job)
	{
		...
	}

An initial implementation would just add the code into this function. If
it turns out some future version requires special incantations to submit
a job, only then should we introduce an ops structure, with only the one
function:

	struct host1x_channel_ops {
		int (*submit)(struct host1x_channel *channel,
			      struct host1x_job *job);
	};

But since only the public API above has been used, access to the special
implementation can be hidden from the user. So the public function could
be modified in this way:

	int host1x_channel_submit(struct hostx1_channel *channel,
				  struct host1x_job *job)
	{
		if (channel->ops && channel->ops->submit)
			return channel->ops->submit(channel, job);

		...
	}

And then you have two choices: either you keep the code for previous
generations after the if block or you provide a separate ops structure
for older generations as well and handle them via the same code path.

One other thing that such a design can help with is refactoring common
code or parameterizing code. Maybe newer generations are not compatible
but can easily be made to work with existing code by introducing a
variable such as register stride or something.

What's really difficult to follow is if an ops structure is accessed via
some global macro. It also breaks encapsulation because you have a
global ops structure. That may even work fine for now, but it will break
once you have more than a single host1x in a system. I know this will
never happen, but all of a sudden it happens anyway and the code breaks.
Doing this right isn't very hard and it will lead to a better design and
is less likely to break at some point.

Thierry

Mark Zhang Dec. 4, 2012, 2:08 a.m. UTC | #32

On 12/04/2012 05:03 AM, Thierry Reding wrote:
[...]
>> I think there's room for letting Terje's complete knowledge of future
>> chips guide the design of the current code that's sent upstream.
>> Certainly we shouldn't add a ton of unnecessary abstraction layers
>> right now that aren't needed for Tegra20/30, but if there's some
>> decision that doesn't affect the bloat, opaqueness, ... of the current
>> code but one choice is better for future development without serious
>> negatives for the current code, it's pretty reasonable to make that
>> decision rather than the other.
> 
> The original point was that the current design stashes every function of
> host1x into an ops structure and you have to go through those ops to get
> at the functionality. I can understand the need to add an ops structure
> to cope with incompatibilities between versions, but as you say there
> should to be a reason for them being introduced. If such reasons exists,
> then I think they at least warrant a comment somewhere.
> 
> Furthermore this is usually best handled by wrapping the ops accesses in
> a public API, so that the ops structure can be hidden within the driver.
> For example, submitting a job to a channel should have a public API such
> as:
> 
> 	int host1x_channel_submit(struct host1x_channel *channel,
> 				  struct host1x_job *job)
> 	{
> 		...
> 	}
> 
> An initial implementation would just add the code into this function. If
> it turns out some future version requires special incantations to submit
> a job, only then should we introduce an ops structure, with only the one
> function:
> 
> 	struct host1x_channel_ops {
> 		int (*submit)(struct host1x_channel *channel,
> 			      struct host1x_job *job);
> 	};
> 
> But since only the public API above has been used, access to the special
> implementation can be hidden from the user. So the public function could
> be modified in this way:
> 
> 	int host1x_channel_submit(struct hostx1_channel *channel,
> 				  struct host1x_job *job)
> 	{
> 		if (channel->ops && channel->ops->submit)
> 			return channel->ops->submit(channel, job);
> 
> 		...
> 	}
> 

I guess we do this in exactly this way at the beginning. Then we
realized that we need to define callbacks to make different tegra has
different logics. So that's why we see the codes have lots of function
ops right now.

If so, this will make Terje modify the code back to the original
version, and this is not an interesting work.

Just my personal guess, no offence.

Mark
> And then you have two choices: either you keep the code for previous
> generations after the if block or you provide a separate ops structure
> for older generations as well and handle them via the same code path.
> 
> One other thing that such a design can help with is refactoring common
> code or parameterizing code. Maybe newer generations are not compatible
> but can easily be made to work with existing code by introducing a
> variable such as register stride or something.
> 
> What's really difficult to follow is if an ops structure is accessed via
> some global macro. It also breaks encapsulation because you have a
> global ops structure. That may even work fine for now, but it will break
> once you have more than a single host1x in a system. I know this will
> never happen, but all of a sudden it happens anyway and the code breaks.
> Doing this right isn't very hard and it will lead to a better design and
> is less likely to break at some point.
> 
> Thierry
> 
> 
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>

Mark Zhang Dec. 4, 2012, 2:11 a.m. UTC | #33

On 12/04/2012 05:03 AM, Thierry Reding wrote:
[...]
> 
> One other thing that such a design can help with is refactoring common
> code or parameterizing code. Maybe newer generations are not compatible
> but can easily be made to work with existing code by introducing a
> variable such as register stride or something.
> 
> What's really difficult to follow is if an ops structure is accessed via
> some global macro. It also breaks encapsulation because you have a
> global ops structure. That may even work fine for now, but it will break
> once you have more than a single host1x in a system. I know this will
> never happen, but all of a sudden it happens anyway and the code breaks.
> Doing this right isn't very hard and it will lead to a better design and
> is less likely to break at some point.
> 

Sorry I forget to reply this in last mail...

Agree. Even for userspace programs, we should avoid using global
variables as possible as we can. So we need to think about this and try
to reduce the number of global vars.

Mark
> Thierry
> 
> 
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>

Terje Bergstrom Dec. 4, 2012, 6:17 a.m. UTC | #34

On 03.12.2012 23:03, Thierry Reding wrote:
> What's really difficult to follow is if an ops structure is accessed via
> some global macro. It also breaks encapsulation because you have a
> global ops structure. That may even work fine for now, but it will break
> once you have more than a single host1x in a system. I know this will
> never happen, but all of a sudden it happens anyway and the code breaks.
> Doing this right isn't very hard and it will lead to a better design and
> is less likely to break at some point.

I agree that the chip ops access goes through too much indirection and
macro magic (which I already declared I hate), so we're going to design
something simpler.

Terje

Thierry Reding Dec. 4, 2012, 9:31 p.m. UTC | #35

On Mon, Dec 03, 2012 at 12:23:32PM -0700, Stephen Warren wrote:
> On 12/01/2012 07:58 AM, Thierry Reding wrote:
> > On Sat, Dec 01, 2012 at 01:31:02PM +0200, Terje Bergström wrote:
> ...
> >> I was thinking of definitions like this:
> >> 
> >> static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v) { 
> >> return (v & 0x1ff) << 0; }
> >> 
> >> versus
> >> 
> >> #define host1x_sync_cfpeek_ctrl_cfpeek_addr_f(v) ((v) >> 16) &
> >> 0x3ff
> >> 
> >> Both of these produce the same machine code and have same usage,
> >> but the latter has type checking and code coverage analysis and
> >> the former is (in my eyes) clearer. In both of these cases the
> >> usage is like this:
> >> 
> >> writel(host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1) |
> >> host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid) |
> >> host1x_sync_cfpeek_ctrl_cfpeek_addr_f(rd_ptr), m->sync_aperture +
> >> host1x_sync_cfpeek_ctrl_r());
> > 
> > Again there's no precedent for doing this with static inline
> > functions. You can do the same with macros. Type checking isn't an
> > issue in these cases since we're talking about bitfields for which
> > no proper type exists.
> 
> I suspect the inline functions could encode signed-vs-unsigned fields,
> perhaps catch u8 variables when they should have been u32, etc.?

I don't see how this would be relevant here. These definitions are only
used in the driver internally and not a public API, therefore none of
those checks should really be needed. If somebody writes code for this
driver and uses the register definitions, they better know what they're
doing. Or at least wrong usage should be filtered out through review.

In my opinion the consistency with how other drivers are written far
outweigh the benefits provided by inline functions. That said, I'm out
of arguments and I don't have a final say anyway, so if it is decided
to stick with inline functions I can find a way to live with them.

Thierry

Terje Bergstrom Dec. 10, 2012, 10:28 a.m. UTC | #36

On 29.11.2012 11:10, Mark Zhang wrote:
>> +
>> +/**
>> + * Write a cpu syncpoint increment to the hardware, without touching
>> + * the cache. Caller is responsible for host being powered.
>> + */
>> +static void host1x_syncpt_cpu_incr(struct nvhost_syncpt *sp, u32 id)
>> +{
>> +     struct nvhost_master *dev = syncpt_to_dev(sp);
>> +     u32 reg_offset = id / 32;
>> +
>> +     if (!nvhost_module_powered(dev->dev)) {
>> +             dev_err(&syncpt_to_dev(sp)->dev->dev,
>> +                     "Trying to access host1x when it's off");
>> +             return;
>> +     }
>> +
>> +     if (!nvhost_syncpt_client_managed(sp, id)
>> +                     && nvhost_syncpt_min_eq_max(sp, id)) {
>> +             dev_err(&syncpt_to_dev(sp)->dev->dev,
>> +                     "Trying to increment syncpoint id %d beyond max\n",
>> +                     id);
>> +             return;
>> +     }
>> +     writel(BIT_MASK(id), dev->sync_aperture +
>> +                     host1x_sync_syncpt_cpu_incr_r() + reg_offset * 4);
> 
> I have a stupid question: According to the name and the context of this
> function, seems it increases the syncpt value which specified by param
> "id". So how does this "writel" increase the value? I don't know much
> about host1x/syncpt reg operations, so could you explain a little bit or
> I just completely have a wrong understanding?

I believe I've implemented most of the requests in this mail, but I seem
to have missed answering this question.

writel() to that register invokes a method in host1x, which increments
the sync point indicated by the value of the register by one.

Best regards,
Terje

[RFC,v2,1/8] video: tegra: Add nvhost driver

Commit Message

Comments

Patch