Message ID | 1353935954-13763-2-git-send-email-tbergstrom@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Nov 26, 2012 at 02:19:07PM +0100, Terje Bergstrom wrote: > + > +struct nvhost_chip_support *nvhost_chip_ops; should be static? > +static int __devinit nvhost_alloc_resources(struct nvhost_master *host) > +{ > + int err; > + > + err = nvhost_init_chip_support(host); > + if (err) > + return err; > + > + return 0; nit: why not just return err - the 'if(err)' is unnecessary)? > + > + nvhost = host; I think this should be delayed until the init is complete as this variable is not cleared if there is a failure during init. Also I feel that the name nvhost is a bit short for an exported variable. > +static void to_state_running_locked(struct platform_device *dev) > +{ > + struct nvhost_device_data *pdata = platform_get_drvdata(dev); > + int prev_state = pdata->powerstate; > + > + if (pdata->powerstate == NVHOST_POWER_STATE_POWERGATED) > + to_state_clockgated_locked(dev); > + > + if (pdata->powerstate == NVHOST_POWER_STATE_CLOCKGATED) { > + int i; > + > + if (dev->dev.parent) > + nvhost_module_busy(to_platform_device(dev->dev.parent)); > + > + for (i = 0; i < pdata->num_clks; i++) { > + int err = clk_prepare_enable(pdata->clk[i]); > + if (err) { > + dev_err(&dev->dev, "Cannot turn on clock %s", > + pdata->clocks[i].name); > + return; In case of an error, returning here leaves some clocks turned on. Sivaram
On Mon, Nov 26, 2012 at 03:19:07PM +0200, Terje Bergstrom wrote: [...] > diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig > index fb9a14e..94c861b 100644 > --- a/drivers/video/Kconfig > +++ b/drivers/video/Kconfig > @@ -2463,4 +2463,6 @@ config FB_SH_MOBILE_MERAM > Up to 4 memory channels can be configured, allowing 4 RGB or > 2 YCbCr framebuffers to be configured. > > +source "drivers/video/tegra/host/Kconfig" > + This could be problematic. Since drivers/video and drivers/gpu/drm are separate trees, this would entail a continuous burden on keeping both trees synchronized. While I realize that eventually it might be better to put the host1x driver in a separate place to accomodate for its use by other subsystems, I'm not sure moving it here right away is the best approach. I'm not sure drivers/video is the best location either. Perhaps drivers/bus would be better? Or maybe we need a new subdirectory for this kind of device. > diff --git a/drivers/video/tegra/host/chip_support.c b/drivers/video/tegra/host/chip_support.c > new file mode 100644 > index 0000000..5a44147 > --- /dev/null > +++ b/drivers/video/tegra/host/chip_support.c > @@ -0,0 +1,48 @@ > +/* > + * drivers/video/tegra/host/chip_support.c I think the general nowadays is to no longer use filenames in comments. [...] > +struct nvhost_chip_support *nvhost_chip_ops; > + > +struct nvhost_chip_support *nvhost_get_chip_ops(void) > +{ > + return nvhost_chip_ops; > +} This seems like it should be more tightly coupled to the host1x device. And it shouldn't be a global variable. > + > +int nvhost_init_chip_support(struct nvhost_master *host) > +{ > + if (nvhost_chip_ops == NULL) { > + nvhost_chip_ops = kzalloc(sizeof(*nvhost_chip_ops), GFP_KERNEL); > + if (nvhost_chip_ops == NULL) { > + pr_err("%s: Cannot allocate nvhost_chip_support\n", > + __func__); > + return -ENOMEM; > + } > + } > + > + nvhost_init_host1x01_support(host, nvhost_chip_ops); > + return 0; > +} We also don't need this. This should really be done by the central host1x device's initialization. > diff --git a/drivers/video/tegra/host/chip_support.h b/drivers/video/tegra/host/chip_support.h [...] > +struct output; What's this? It doesn't seem to be used anywhere. > +struct nvhost_master; Why do you suffix this with _master? The whole point of host1x is to be the "master" so you can just as well call it nvhost, right? Ideally you'd call it host1x, but I'm repeating myself. =) > +struct nvhost_syncpt; > +struct platform_device; > + > +struct nvhost_syncpt_ops { > + void (*reset)(struct nvhost_syncpt *, u32 id); > + void (*reset_wait_base)(struct nvhost_syncpt *, u32 id); > + void (*read_wait_base)(struct nvhost_syncpt *, u32 id); > + u32 (*update_min)(struct nvhost_syncpt *, u32 id); > + void (*cpu_incr)(struct nvhost_syncpt *, u32 id); > + void (*debug)(struct nvhost_syncpt *); > + const char * (*name)(struct nvhost_syncpt *, u32 id); > +}; Why are these even defined as ops structure? Tegra20 and Tegra30 seem to be compatible when it comes to handling syncpoints. I thought they would even be compatible in all other aspects as well, so why even have this? > + > +struct nvhost_chip_support { > + const char *soc_name; > + struct nvhost_syncpt_ops syncpt; > +}; > + > +struct nvhost_chip_support *nvhost_get_chip_ops(void); > + > +#define syncpt_op() (nvhost_get_chip_ops()->syncpt) You really shouldn't be doing this, but rather use explicit accesses for these structures. If you're design doesn't scatter these definitions across several files then it isn't difficult to obtain the correct pointers and you don't need these "shortcuts". > diff --git a/drivers/video/tegra/host/dev.c b/drivers/video/tegra/host/dev.c [...] > +u32 host1x_syncpt_incr_max(u32 id, u32 incrs) > +{ > + struct nvhost_syncpt *sp = &nvhost->syncpt; > + return nvhost_syncpt_incr_max(sp, id, incrs); > +} > +EXPORT_SYMBOL(host1x_syncpt_incr_max); This API looks odd. Should syncpoints not be considered as regular resources, much like interrupts? In that case it would be easier to abstract them away behind an opaque type. It looks like you already use the struct nvhost_syncpt to refer to the set of syncpoints associated with a host1x device. How about you use nvhost/host1x_syncpt to refer to individual syncpoints instead. You could export an array of those from your host1x device and implement a basic resource allocation mechanism on top, similar to how other resources are handled in the kernel. So a host1x client device could call host1x_request_syncpt() to allocate a syncpoint from it's host1x parent dynamically along with passing a name and a syncpoint handler to it. > + > +void host1x_syncpt_incr(u32 id) > +{ > + struct nvhost_syncpt *sp = &nvhost->syncpt; > + nvhost_syncpt_incr(sp, id); > +} > +EXPORT_SYMBOL(host1x_syncpt_incr); Similarly, instead of passing an integer here, host1x clients would pass a pointer to the requested syncpoint instead. > +bool host1x_powered(struct platform_device *dev) > +{ [...] > +} > +EXPORT_SYMBOL(host1x_powered); > + > +void host1x_busy(struct platform_device *dev) > +{ [...] > +} > +EXPORT_SYMBOL(host1x_busy); > + > +void host1x_idle(struct platform_device *dev) > +{ [...] > +} > +EXPORT_SYMBOL(host1x_idle); These look like a reimplementation of the runtime power-management framework. > diff --git a/drivers/video/tegra/host/host1x/host1x.c b/drivers/video/tegra/host/host1x/host1x.c [...] > +struct nvhost_master *nvhost; Bad habbit. I know that this is a popular shortcut. However this also leads to very bad designs because you're allowed to reuse this pointer from wherever you like. When I wrote the tegra-drm code I explicitly made sure to not use any such global variable. In the end it forces you to clean up the driver design. As a bonus you automatically get support for any number of host1x devices on the same SoC. Now you will probably tell me that this is never going to happen. People also used to think that a computers would never use more than a single CPU... > +static void power_on_host(struct platform_device *dev) > +{ > + struct nvhost_master *host = nvhost_get_private_data(dev); > + > + nvhost_syncpt_reset(&host->syncpt); > +} > + > +static int power_off_host(struct platform_device *dev) > +{ > + struct nvhost_master *host = nvhost_get_private_data(dev); > + > + nvhost_syncpt_save(&host->syncpt); > + return 0; > +} These seem like possible candidates for runtime PM. > + > +static void nvhost_free_resources(struct nvhost_master *host) > +{ > +} This should be removed since it's empty. > + > +static int __devinit nvhost_alloc_resources(struct nvhost_master *host) > +{ > + int err; > + > + err = nvhost_init_chip_support(host); > + if (err) > + return err; > + > + return 0; > +} Again, this chip support concept is not useful, so this function can go away as well. Also nvhost_init_chip_support() doesn't allocate any resources so it shouldn't be called from this function in the first place. > + > +static int __devinit nvhost_probe(struct platform_device *dev) > +{ > + struct nvhost_master *host; > + struct resource *regs, *intr0, *intr1; > + int i, err; > + struct nvhost_device_data *pdata = > + (struct nvhost_device_data *)dev->dev.platform_data; Platform data should not be used. Tegra is DT only. > + regs = platform_get_resource(dev, IORESOURCE_MEM, 0); > + intr0 = platform_get_resource(dev, IORESOURCE_IRQ, 0); > + intr1 = platform_get_resource(dev, IORESOURCE_IRQ, 1); > + > + if (!regs || !intr0 || !intr1) { I prefer to have these checked for explicitly, one by one for readability and potentially more useful diagnostics. Also you should be using platform_get_irq() for interrupts. Furthermore the host1x DT node (and the TRM) name the interrupts "syncpt" and "general", so maybe those would be more useful variable names than "intr0" and "intr1". But since you don't use them anyway they shouldn't be part of this patch. > + host = devm_kzalloc(&dev->dev, sizeof(*host), GFP_KERNEL); > + if (!host) > + return -ENOMEM; > + > + nvhost = host; > + > + host->dev = dev; > + > + /* Copy host1x parameters. The private_data gets replaced > + * by nvhost_master later */ Multiline comments should be in this format: /* * foo */ > + memcpy(&host->info, pdata->private_data, > + sizeof(struct host1x_device_info)); I don't think passing data in this way shouldn't be necessary as discussed in the subthread on the Tegra AUXDATA. > + > + pdata->finalize_poweron = power_on_host; > + pdata->prepare_poweroff = power_off_host; > + > + pdata->pdev = dev; > + > + /* set common host1x device data */ > + platform_set_drvdata(dev, pdata); > + > + /* set private host1x device data */ > + nvhost_set_private_data(dev, host); > + > + host->aperture = devm_request_and_ioremap(&dev->dev, regs); > + if (!host->aperture) { aperture is confusing as it is typically used for GTT-type memory regions, so it may be mistaken for the GART found on Tegra 2. Why not call it "regs" instead? > + dev_err(&dev->dev, "failed to remap host registers\n"); This is unnecessary. devm_request_and_ioremap() already prints an error message on failure. > + for (i = 0; i < pdata->num_clks; i++) > + clk_prepare_enable(pdata->clk[i]); > + nvhost_syncpt_reset(&host->syncpt); > + for (i = 0; i < pdata->num_clks; i++) > + clk_disable_unprepare(pdata->clk[i]); Stephen already hinted at this when discussing the AUXDATA. You should explicitly request the clocks. > +static int __exit nvhost_remove(struct platform_device *dev) This should really be __devexit to allow the driver to be built as a module. However, __dev* are deprecated and in the process of being removed so you can just drop __exit as well. > +static struct of_device_id host1x_match[] __devinitdata = { __devinitdata can be dropped. > + { .compatible = "nvidia,tegra20-host1x", }, > + { .compatible = "nvidia,tegra30-host1x", }, > + { }, > +}; > + > +static struct platform_driver platform_driver = { > + .probe = nvhost_probe, > + .remove = __exit_p(nvhost_remove), __exit_p also. > + .suspend = nvhost_suspend, > + .resume = nvhost_resume, > + .driver = { > + .owner = THIS_MODULE, > + .name = DRIVER_NAME, > + .of_match_table = of_match_ptr(host1x_match), No need for of_match_ptr(). > +static int __init nvhost_mod_init(void) > +{ > + return platform_driver_register(&platform_driver); > +} > + > +static void __exit nvhost_mod_exit(void) > +{ > + platform_driver_unregister(&platform_driver); > +} > + > +module_init(nvhost_mod_init); > +module_exit(nvhost_mod_exit); Use module_platform_driver(). > diff --git a/drivers/video/tegra/host/host1x/host1x.h b/drivers/video/tegra/host/host1x/host1x.h [...] > +#define TRACE_MAX_LENGTH 128U > +#define IFACE_NAME "nvhost" None of these seem to be used. > +static inline void *nvhost_get_private_data(struct platform_device *_dev) > +{ > + struct nvhost_device_data *pdata = > + (struct nvhost_device_data *)platform_get_drvdata(_dev); > + WARN_ON(!pdata); > + return (pdata && pdata->private_data) ? pdata->private_data : NULL; > +} > + > +static inline void nvhost_set_private_data(struct platform_device *_dev, > + void *priv_data) > +{ > + struct nvhost_device_data *pdata = > + (struct nvhost_device_data *)platform_get_drvdata(_dev); > + WARN_ON(!pdata); > + if (pdata) > + pdata->private_data = priv_data; > +} You should need none of these. Instead put all the data you need into you struct host1x and associate that with the platform device using platform_set_drvdata(). > +static inline > +struct nvhost_master *nvhost_get_host(struct platform_device *_dev) > +{ > + struct platform_device *pdev; > + > + if (_dev->dev.parent) { > + pdev = to_platform_device(_dev->dev.parent); > + return nvhost_get_private_data(pdev); > + } else > + return nvhost_get_private_data(_dev); > +} > + > +static inline > +struct platform_device *nvhost_get_parent(struct platform_device *_dev) > +{ > + return _dev->dev.parent ? to_platform_device(_dev->dev.parent) : NULL; > +} These don't seem to be used. > diff --git a/drivers/video/tegra/host/host1x/host1x01.c b/drivers/video/tegra/host/host1x/host1x01.c > new file mode 100644 > index 0000000..d53302d > --- /dev/null > +++ b/drivers/video/tegra/host/host1x/host1x01.c > @@ -0,0 +1,37 @@ > +/* > + * drivers/video/tegra/host/host1x01.c > + * > + * Host1x init for T20 and T30 Architecture Chips > + * > + * Copyright (c) 2011-2012, NVIDIA Corporation. > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms and conditions of the GNU General Public License, > + * version 2, as published by the Free Software Foundation. > + * > + * This program is distributed in the hope it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program. If not, see <http://www.gnu.org/licenses/>. > + */ > + > +#include <linux/nvhost.h> > + > +#include "host1x/host1x01.h" > +#include "host1x/host1x.h" > +#include "host1x/host1x01_hardware.h" > +#include "chip_support.h" > + > +#include "host1x/host1x_syncpt.c" > + > +int nvhost_init_host1x01_support(struct nvhost_master *host, > + struct nvhost_chip_support *op) > +{ > + host->sync_aperture = host->aperture + HOST1X_CHANNEL_SYNC_REG_BASE; Usually you don't keep separate variables for subregions. This can equally well be done with just adding a corresponding offset. Then again, I already said that this whole chip support concept is unnecessary and can be dropped. > diff --git a/drivers/video/tegra/host/host1x/host1x_syncpt.c b/drivers/video/tegra/host/host1x/host1x_syncpt.c [...] > +/** > + * Write the current syncpoint value back to hw. > + */ > +static void host1x_syncpt_reset(struct nvhost_syncpt *sp, u32 id) > +{ > + struct nvhost_master *dev = syncpt_to_dev(sp); > + int min = nvhost_syncpt_read_min(sp, id); > + writel(min, dev->sync_aperture + (host1x_sync_syncpt_0_r() + id * 4)); > +} Again, better to represent individual syncpoints with opaque pointers and dereference them here. Obviously this file will need access to the structure definition. > +static void host1x_syncpt_debug(struct nvhost_syncpt *sp) > +{ > + u32 i; > + for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) { > + u32 max = nvhost_syncpt_read_max(sp, i); > + u32 min = nvhost_syncpt_update_min(sp, i); > + if (!max && !min) > + continue; > + dev_info(&syncpt_to_dev(sp)->dev->dev, > + "id %d (%s) min %d max %d\n", > + i, syncpt_op().name(sp, i), > + min, max); > + > + } > + > + for (i = 0; i < nvhost_syncpt_nb_bases(sp); i++) { > + u32 base_val; > + host1x_syncpt_read_wait_base(sp, i); > + base_val = sp->base_val[i]; > + if (base_val) > + dev_info(&syncpt_to_dev(sp)->dev->dev, > + "waitbase id %d val %d\n", > + i, base_val); > + > + } > +} This should probably be integrated with debugfs. > diff --git a/drivers/video/tegra/host/host1x/hw_host1x01_sync.h b/drivers/video/tegra/host/host1x/hw_host1x01_sync.h Autogenerated files are generally not acceptable. And I already mentioned before that you should be using #define instead of static inline functions for register and bit definitions. > diff --git a/drivers/video/tegra/host/nvhost_acm.c b/drivers/video/tegra/host/nvhost_acm.c [...] This whole file largely looks like a reimplementation of runtime PM. You should investigate if you can't reuse the existing infrastructure. > + /* Init the power sysfs attributes for this device */ > + pdata->power_attrib = devm_kzalloc(&dev->dev, > + sizeof(struct nvhost_device_power_attr), > + GFP_KERNEL); > + if (!pdata->power_attrib) { > + dev_err(&dev->dev, "Unable to allocate sysfs attributes\n"); > + return -ENOMEM; > + } > + pdata->power_attrib->ndev = dev; > + > + pdata->power_kobj = kobject_create_and_add("acm", &dev->dev.kobj); > + if (!pdata->power_kobj) { > + dev_err(&dev->dev, "Could not add dir 'power'\n"); > + err = -EIO; > + goto fail_attrib_alloc; > + } > + > + attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_CLOCKGATE_DELAY]; > + attr->attr.name = "clockgate_delay"; > + attr->attr.mode = S_IWUSR | S_IRUGO; > + attr->show = clockgate_delay_show; > + attr->store = clockgate_delay_store; > + if (sysfs_create_file(pdata->power_kobj, &attr->attr)) { > + dev_err(&dev->dev, "Could not create sysfs attribute clockgate_delay\n"); > + err = -EIO; > + goto fail_clockdelay; > + } > + > + attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY]; > + attr->attr.name = "powergate_delay"; > + attr->attr.mode = S_IWUSR | S_IRUGO; > + attr->show = powergate_delay_show; > + attr->store = powergate_delay_store; > + if (sysfs_create_file(pdata->power_kobj, &attr->attr)) { > + dev_err(&dev->dev, "Could not create sysfs attribute powergate_delay\n"); > + err = -EIO; > + goto fail_powergatedelay; > + } > + > + attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_REFCOUNT]; > + attr->attr.name = "refcount"; > + attr->attr.mode = S_IRUGO; > + attr->show = refcount_show; > + if (sysfs_create_file(pdata->power_kobj, &attr->attr)) { > + dev_err(&dev->dev, "Could not create sysfs attribute refcount\n"); > + err = -EIO; > + goto fail_refcount; > + } This is a very funky way of creating sysfs attributes. What you probably want here are device attributes. See Documentation/filesystems/sysfs.txt and include/linux/sysfs.h. But if you can replace this by runtime PM, you'll get similar functionality for free anyway. > diff --git a/drivers/video/tegra/host/nvhost_syncpt.c b/drivers/video/tegra/host/nvhost_syncpt.c [...] > +/** > + * Returns true if syncpoint is expired, false if we may need to wait > + */ > +bool nvhost_syncpt_is_expired( > + struct nvhost_syncpt *sp, > + u32 id, > + u32 thresh) > +{ > + u32 current_val; > + u32 future_val; > + smp_rmb(); What do you need a read memory barrier for? > +/* Displays the current value of the sync point via sysfs */ > +static ssize_t syncpt_min_show(struct kobject *kobj, > + struct kobj_attribute *attr, char *buf) > +{ > + struct nvhost_syncpt_attr *syncpt_attr = > + container_of(attr, struct nvhost_syncpt_attr, attr); > + > + return snprintf(buf, PAGE_SIZE, "%u", > + nvhost_syncpt_read(&syncpt_attr->host->syncpt, > + syncpt_attr->id)); > +} > + > +static ssize_t syncpt_max_show(struct kobject *kobj, > + struct kobj_attribute *attr, char *buf) > +{ > + struct nvhost_syncpt_attr *syncpt_attr = > + container_of(attr, struct nvhost_syncpt_attr, attr); > + > + return snprintf(buf, PAGE_SIZE, "%u", > + nvhost_syncpt_read_max(&syncpt_attr->host->syncpt, > + syncpt_attr->id)); > +} This looks like it belongs in debugfs. > +int nvhost_syncpt_init(struct platform_device *dev, > + struct nvhost_syncpt *sp) > +{ > + int i; > + struct nvhost_master *host = syncpt_to_dev(sp); > + int err = 0; > + > + /* Allocate structs for min, max and base values */ > + sp->min_val = kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_pts(sp), > + GFP_KERNEL); > + sp->max_val = kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_pts(sp), > + GFP_KERNEL); > + sp->base_val = kzalloc(sizeof(u32) * nvhost_syncpt_nb_bases(sp), > + GFP_KERNEL); > + sp->lock_counts = > + kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_mlocks(sp), > + GFP_KERNEL); Again, I really think that syncpoints should be objects with corresponding attributes instead of keeping them in these arrays. > diff --git a/include/linux/nvhost.h b/include/linux/nvhost.h [...] > +struct host1x_device_info { > + int nb_channels; /* host1x: num channels supported */ > + int nb_pts; /* host1x: num syncpoints supported */ > + int nb_bases; /* host1x: num syncpoints supported */ > + u32 client_managed; /* host1x: client managed syncpts */ > + int nb_mlocks; /* host1x: number of mlocks */ > + const char **syncpt_names; /* names of sync points */ > +}; > + > +struct nvhost_device_data { > + int version; /* ip version number of device */ > + int id; /* Separates clients of same hw */ > + int index; /* Hardware channel number */ > + void __iomem *aperture; /* Iomem mapped to kernel */ > + > + u32 syncpts; /* Bitfield of sync points used */ > + u32 modulemutexes; /* Bit field of module mutexes */ > + > + u32 class; /* Device class */ > + bool serialize; /* Serialize submits in the channel */ > + > + int powergate_ids[NVHOST_MODULE_MAX_POWERGATE_IDS]; > + bool can_powergate; /* True if module can be power gated */ > + int clockgate_delay;/* Delay before clock gated */ > + int powergate_delay;/* Delay before power gated */ > + struct nvhost_clock clocks[NVHOST_MODULE_MAX_CLOCKS];/* Clock names */ > + > + struct delayed_work powerstate_down;/* Power state management */ > + int num_clks; /* Number of clocks opened for dev */ > + struct clk *clk[NVHOST_MODULE_MAX_CLOCKS]; > + struct mutex lock; /* Power management lock */ > + int powerstate; /* Current power state */ > + int refcount; /* Number of tasks active */ > + wait_queue_head_t idle_wq; /* Work queue for idle */ > + > + struct nvhost_channel *channel; /* Channel assigned for the module */ > + struct kobject *power_kobj; /* kobj to hold power sysfs entries */ > + struct nvhost_device_power_attr *power_attrib; /* sysfs attributes */ > + struct dentry *debugfs; /* debugfs directory */ > + > + void *private_data; /* private platform data */ > + struct platform_device *pdev; /* owner platform_device */ > + > + /* Finalize power on. Can be used for context restore. */ > + void (*finalize_poweron)(struct platform_device *dev); > + > + /* Device is busy. */ > + void (*busy)(struct platform_device *); > + > + /* Device is idle. */ > + void (*idle)(struct platform_device *); > + > + /* Device is going to be suspended */ > + void (*suspend_ndev)(struct platform_device *); > + > + /* Device is initialized */ > + void (*init)(struct platform_device *dev); > + > + /* Device is de-initialized. */ > + void (*deinit)(struct platform_device *dev); > + > + /* Preparing for power off. Used for context save. */ > + int (*prepare_poweroff)(struct platform_device *dev); > + > + /* Clock gating callbacks */ > + int (*prepare_clockoff)(struct platform_device *dev); > + void (*finalize_clockon)(struct platform_device *dev); > +}; A lot of this can be removed if you use existing infrastructure and simplify the design a bit. Most of it can probably move into the main struct host1x to avoid needless indirections that make the code hard to follow and maintain. Thierry
On 11/26/2012 09:19 PM, Terje Bergström <tbergstrom@nvidia.com> wrote: > Add nvhost, the driver for host1x. This patch adds support for reading and > incrementing sync points and dynamic power management. > > Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> > > --- > drivers/video/Kconfig | 2 + > drivers/video/Makefile | 2 + > drivers/video/tegra/host/Kconfig | 5 + > drivers/video/tegra/host/Makefile | 10 + > drivers/video/tegra/host/chip_support.c | 48 ++ > drivers/video/tegra/host/chip_support.h | 52 +++ > drivers/video/tegra/host/dev.c | 96 ++++ > drivers/video/tegra/host/host1x/Makefile | 7 + > drivers/video/tegra/host/host1x/host1x.c | 204 +++++++++ > drivers/video/tegra/host/host1x/host1x.h | 78 ++++ > drivers/video/tegra/host/host1x/host1x01.c | 37 ++ > drivers/video/tegra/host/host1x/host1x01.h | 29 ++ > .../video/tegra/host/host1x/host1x01_hardware.h | 36 ++ > drivers/video/tegra/host/host1x/host1x_syncpt.c | 156 +++++++ > drivers/video/tegra/host/host1x/hw_host1x01_sync.h | 398 ++++++++++++++++ > drivers/video/tegra/host/nvhost_acm.c | 481 ++++++++++++++++++++ > drivers/video/tegra/host/nvhost_acm.h | 45 ++ > drivers/video/tegra/host/nvhost_syncpt.c | 333 ++++++++++++++ > drivers/video/tegra/host/nvhost_syncpt.h | 136 ++++++ > include/linux/nvhost.h | 143 ++++++ > 20 files changed, 2298 insertions(+) [...] > diff --git a/drivers/video/tegra/host/chip_support.c b/drivers/video/tegra/host/chip_support.c > +#include "chip_support.h" > +#include "host1x/host1x01.h" > + > +struct nvhost_chip_support *nvhost_chip_ops; > + > +struct nvhost_chip_support *nvhost_get_chip_ops(void) > +{ > + return nvhost_chip_ops; > +} If you wanna hide "nvhost_chip_ops" from other source files, declare it as "static". So this is not a static member which means other files is able to touch it by "extern" but we also define a function to get it, and this looks redundant. [...] > diff --git a/drivers/video/tegra/host/host1x/Makefile b/drivers/video/tegra/host/host1x/Makefile > new file mode 100644 > index 0000000..330d507 > --- /dev/null > +++ b/drivers/video/tegra/host/host1x/Makefile > @@ -0,0 +1,7 @@ > +ccflags-y = -Idrivers/video/tegra/host > + > +nvhost-host1x-objs = \ > + host1x.o \ > + host1x01.o Can we rename this "host1x01.c"? I just really don't like this kind of variables/files, I mean, I can't imagine the purpose of the file according to it's name... [...] > + > +static int __devinit nvhost_alloc_resources(struct nvhost_master *host) > +{ > + int err; > + > + err = nvhost_init_chip_support(host); > + if (err) > + return err; > + > + return 0; > +} Just "return nvhost_init_chip_support(host)" is enough. If so, do we still need this function? [...] > + > +static int __devinit nvhost_probe(struct platform_device *dev) > + [...] > + dev_info(&dev->dev, "initialized\n"); > + > + return 0; > + > +fail: Add more "free" codes here. Actually, "nvhost_free_resources" frees the host->intr.syncpt which is not needed to free manually. Seems at least we need to add "nvhost_syncpt_deinit" here. [...] > + > +static struct of_device_id host1x_match[] __devinitdata = { > + { .compatible = "nvidia,tegra20-host1x", }, > + { .compatible = "nvidia,tegra30-host1x", }, Again, place tegra30-host1x before tegra20-host1x. [...] > + > +/** > + * Write a cpu syncpoint increment to the hardware, without touching > + * the cache. Caller is responsible for host being powered. > + */ > +static void host1x_syncpt_cpu_incr(struct nvhost_syncpt *sp, u32 id) > +{ > + struct nvhost_master *dev = syncpt_to_dev(sp); > + u32 reg_offset = id / 32; > + > + if (!nvhost_module_powered(dev->dev)) { > + dev_err(&syncpt_to_dev(sp)->dev->dev, > + "Trying to access host1x when it's off"); > + return; > + } > + > + if (!nvhost_syncpt_client_managed(sp, id) > + && nvhost_syncpt_min_eq_max(sp, id)) { > + dev_err(&syncpt_to_dev(sp)->dev->dev, > + "Trying to increment syncpoint id %d beyond max\n", > + id); > + return; > + } > + writel(BIT_MASK(id), dev->sync_aperture + > + host1x_sync_syncpt_cpu_incr_r() + reg_offset * 4); I have a stupid question: According to the name and the context of this function, seems it increases the syncpt value which specified by param "id". So how does this "writel" increase the value? I don't know much about host1x/syncpt reg operations, so could you explain a little bit or I just completely have a wrong understanding? [...] > + > +static ssize_t powergate_delay_store(struct kobject *kobj, > + struct kobj_attribute *attr, const char *buf, size_t count) > +{ > + int powergate_delay = 0, ret = 0; > + struct nvhost_device_power_attr *power_attribute = > + container_of(attr, struct nvhost_device_power_attr, > + power_attr[NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY]); > + struct platform_device *dev = power_attribute->ndev; > + struct nvhost_device_data *pdata = platform_get_drvdata(dev); > + > + if (!pdata->can_powergate) { > + dev_info(&dev->dev, "does not support power-gating\n"); > + return count; > + } > + > + mutex_lock(&pdata->lock); > + ret = sscanf(buf, "%d", &powergate_delay); > + if (ret == 1 && powergate_delay >= 0) > + pdata->powergate_delay = powergate_delay; > + else > + dev_err(&dev->dev, "Invalid powergate delay\n"); > + mutex_unlock(&pdata->lock); > + > + return count; Why we need to return an unchanged param? Seems param "count" doesn't make sense here. [...] > + > +int nvhost_module_init(struct platform_device *dev) > +{ > + int i = 0, err = 0; > + struct kobj_attribute *attr = NULL; > + struct nvhost_device_data *pdata = platform_get_drvdata(dev); > + > + /* initialize clocks to known state */ > + while (pdata->clocks[i].name && i < NVHOST_MODULE_MAX_CLOCKS) { > + long rate = pdata->clocks[i].default_rate; > + struct clk *c; > + > + c = devm_clk_get(&dev->dev, pdata->clocks[i].name); > + if (IS_ERR_OR_NULL(c)) { > + dev_err(&dev->dev, "Cannot get clock %s\n", > + pdata->clocks[i].name); > + return -ENODEV; > + } > + > + rate = clk_round_rate(c, rate); > + clk_prepare_enable(c); > + clk_set_rate(c, rate); > + clk_disable_unprepare(c); > + pdata->clk[i] = c; > + i++; > + } > + pdata->num_clks = i; > + > + mutex_init(&pdata->lock); > + init_waitqueue_head(&pdata->idle_wq); > + INIT_DELAYED_WORK(&pdata->powerstate_down, powerstate_down_handler); > + > + /* power gate units that we can power gate */ > + if (pdata->can_powergate) { > + do_powergate_locked(pdata->powergate_ids[0]); > + do_powergate_locked(pdata->powergate_ids[1]); Seems we don't set these 2 powergate_ids. Does this mean we have not enabled power management feature in this version? [...] > + > +int nvhost_module_suspend(struct platform_device *dev) > +{ > + int ret; > + struct nvhost_device_data *pdata = platform_get_drvdata(dev); > + > + ret = wait_event_timeout(pdata->idle_wq, is_module_idle(dev), > + ACM_SUSPEND_WAIT_FOR_IDLE_TIMEOUT); > + if (ret == 0) { > + dev_info(&dev->dev, "%s prevented suspend\n", > + dev_name(&dev->dev)); > + return -EBUSY; > + } > + I'm not sure whether there is a race condition here. We wait until this module is idle(refcount == 0), then try to powergate it next. But the wait queue function "powerstate_down_handler" might already powergate it. So we need to either "cancel_delayed_work(&pdate->powerstate_down)" before waiting the module to idle state or add some protection codes in "to_state_powergated_locked". > + mutex_lock(&pdata->lock); > + cancel_delayed_work(&pdata->powerstate_down); > + to_state_powergated_locked(dev); > + mutex_unlock(&pdata->lock); > + > + if (pdata->suspend_ndev) > + pdata->suspend_ndev(dev); > + > + return 0; > +} > + [...] > + > +int nvhost_syncpt_init(struct platform_device *dev, > + struct nvhost_syncpt *sp) > +{ > + int i; > + struct nvhost_master *host = syncpt_to_dev(sp); > + int err = 0; > + > + /* Allocate structs for min, max and base values */ > + sp->min_val = kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_pts(sp), > + GFP_KERNEL); > + sp->max_val = kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_pts(sp), > + GFP_KERNEL); > + sp->base_val = kzalloc(sizeof(u32) * nvhost_syncpt_nb_bases(sp), > + GFP_KERNEL); > + sp->lock_counts = > + kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_mlocks(sp), > + GFP_KERNEL); > + > + if (!(sp->min_val && sp->max_val && sp->base_val && sp->lock_counts)) { > + /* frees happen in the deinit */ > + err = -ENOMEM; > + goto fail; > + } > + > + sp->kobj = kobject_create_and_add("syncpt", &dev->dev.kobj); > + if (!sp->kobj) { > + err = -EIO; > + goto fail; > + } > + > + /* Allocate two attributes for each sync point: min and max */ > + sp->syncpt_attrs = kzalloc(sizeof(*sp->syncpt_attrs) > + * nvhost_syncpt_nb_pts(sp) * 2, GFP_KERNEL); > + if (!sp->syncpt_attrs) { > + err = -ENOMEM; > + goto fail; > + } > + > + /* Fill in the attributes */ > + for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) { > + char name[MAX_SYNCPT_LENGTH]; > + struct kobject *kobj; > + struct nvhost_syncpt_attr *min = &sp->syncpt_attrs[i*2]; > + struct nvhost_syncpt_attr *max = &sp->syncpt_attrs[i*2+1]; > + > + /* Create one directory per sync point */ > + snprintf(name, sizeof(name), "%d", i); > + kobj = kobject_create_and_add(name, sp->kobj); Where do we "kobject_put" this kobj? [...] > + if (!kobj) { > + err = -EIO; > + goto fail; > + } > + > + min->id = i; > + min->host = host; > + min->attr.attr.name = min_name; > + min->attr.attr.mode = S_IRUGO; > + min->attr.show = syncpt_min_show; > + if (sysfs_create_file(kobj, &min->attr.attr)) { > + err = -EIO; > + goto fail; > + } > + > + max->id = i; > + max->host = host; > + max->attr.attr.name = max_name; > + max->attr.attr.mode = S_IRUGO; > + max->attr.show = syncpt_max_show; > + if (sysfs_create_file(kobj, &max->attr.attr)) { > + err = -EIO; > + goto fail; > + } > + } > + > + return err; > + > +fail: > + nvhost_syncpt_deinit(sp); > + return err; > +} > + [...] > +/* public host1x sync-point management APIs */ > +u32 host1x_syncpt_incr_max(u32 id, u32 incrs); > +void host1x_syncpt_incr(u32 id); > +u32 host1x_syncpt_read(u32 id); > + > +#endif >
On 28.11.2012 23:23, Thierry Reding wrote: > This could be problematic. Since drivers/video and drivers/gpu/drm are > separate trees, this would entail a continuous burden on keeping both > trees synchronized. While I realize that eventually it might be better > to put the host1x driver in a separate place to accomodate for its use > by other subsystems, I'm not sure moving it here right away is the best > approach. I understand your point, but I hope also that we'd end up with something that can be used as basis for the downstream kernel to migrate to upstream stack. The key point here is to make the API between nvhost and tegradrm as small and robust to changes as possible. > I'm not sure drivers/video is the best location either. Perhaps > drivers/bus would be better? Or maybe we need a new subdirectory for > this kind of device. This is a question I don't have an answer to. I'm perfectly ok moving it wherever the public opinion leads it to. > I think the general nowadays is to no longer use filenames in comments. Ok, I hadn't noticed that. I'll remove them. It's redundant information anyway. >> +struct nvhost_chip_support *nvhost_chip_ops; >> + >> +struct nvhost_chip_support *nvhost_get_chip_ops(void) >> +{ >> + return nvhost_chip_ops; >> +} > > This seems like it should be more tightly coupled to the host1x device. > And it shouldn't be a global variable. Yeah, I will figure out a better way to handle the chip ops. I'm not too happy with it. Give me a bit of time to come up with a good solution. >> +struct output; > > What's this? It doesn't seem to be used anywhere. It's just a mistake. The struct is used in debug code, but not referred to in this file so the forward declaration is not needed. >> +struct nvhost_master; > > Why do you suffix this with _master? The whole point of host1x is to be > the "master" so you can just as well call it nvhost, right? Ideally > you'd call it host1x, but I'm repeating myself. =) Yes, the name is just a historic relict and I'm blind to them as I've been staring at the code for so long. I think "host1x" would be a good name for the struct. >> +struct nvhost_syncpt_ops { >> + void (*reset)(struct nvhost_syncpt *, u32 id); >> + void (*reset_wait_base)(struct nvhost_syncpt *, u32 id); >> + void (*read_wait_base)(struct nvhost_syncpt *, u32 id); >> + u32 (*update_min)(struct nvhost_syncpt *, u32 id); >> + void (*cpu_incr)(struct nvhost_syncpt *, u32 id); >> + void (*debug)(struct nvhost_syncpt *); >> + const char * (*name)(struct nvhost_syncpt *, u32 id); >> +}; > > Why are these even defined as ops structure? Tegra20 and Tegra30 seem to > be compatible when it comes to handling syncpoints. I thought they would > even be compatible in all other aspects as well, so why even have this? Tegra20 and Tegra30 are compatible, but future chips are not. I was hoping we would be ready in upstream kernel for future chips. >> +#define syncpt_op() (nvhost_get_chip_ops()->syncpt) > > You really shouldn't be doing this, but rather use explicit accesses for > these structures. If you're design doesn't scatter these definitions > across several files then it isn't difficult to obtain the correct > pointers and you don't need these "shortcuts". Do you mean that I would move the ops to be part of f.ex. nvhost_syncpt or nvhost_intr structs? > This API looks odd. Should syncpoints not be considered as regular > resources, much like interrupts? In that case it would be easier to > abstract them away behind an opaque type. It looks like you already use > the struct nvhost_syncpt to refer to the set of syncpoints associated > with a host1x device. > > How about you use nvhost/host1x_syncpt to refer to individual syncpoints > instead. You could export an array of those from your host1x device and > implement a basic resource allocation mechanism on top, similar to how > other resources are handled in the kernel. > > So a host1x client device could call host1x_request_syncpt() to allocate > a syncpoint from it's host1x parent dynamically along with passing a > name and a syncpoint handler to it. That might work. I'll think about that - thanks. >> +bool host1x_powered(struct platform_device *dev) >> +{ > [...] >> +} >> +EXPORT_SYMBOL(host1x_powered); >> + >> +void host1x_busy(struct platform_device *dev) >> +{ > [...] >> +} >> +EXPORT_SYMBOL(host1x_busy); >> + >> +void host1x_idle(struct platform_device *dev) >> +{ > [...] >> +} >> +EXPORT_SYMBOL(host1x_idle); > > These look like a reimplementation of the runtime power-management > framework. Yes, we at some point tried to move to use runtime PM. The first attempt was thwarted by runtime PM and system suspend conflicting with each other. I believe this is pretty much fixed in later versions of kernel. Also, the problem was that runtime PM doesn't support multiple power states. In downstream kernel, we support clock gating and power gating. When we moved to runtime PM and implemented power gating on top of that, we ended up with more code than just using the current ACM code. I have a developer starting to look into how we could take runtime PM again into use with proper power gating support. It'll take a while to get that right. It might be best to rip the dynamic power management out from this patch set, and introduce it later when we have a proper runtime PM solution. I'll skip the other comments about ACM because of this. >> diff --git a/drivers/video/tegra/host/host1x/host1x.c b/drivers/video/tegra/host/host1x/host1x.c > [...] >> +struct nvhost_master *nvhost; > Bad habbit. I know that this is a popular shortcut. However this also > leads to very bad designs because you're allowed to reuse this pointer > from wherever you like. > > When I wrote the tegra-drm code I explicitly made sure to not use any > such global variable. In the end it forces you to clean up the driver > design. > > As a bonus you automatically get support for any number of host1x > devices on the same SoC. Now you will probably tell me that this is > never going to happen. People also used to think that a computers would > never use more than a single CPU... I think this might get cleaned up at the same time with cleaning up the auxdata/chip_ops design. We used to have this struct set as driver private data, but as we started using that for another purpose, we moved this variable out. >> + >> +static void nvhost_free_resources(struct nvhost_master *host) >> +{ >> +} > > This should be removed since it's empty. True. I wonder how that happened - there was content since recently, but I guess I deleted the code without noticing that the function needs to go, too. >> + regs = platform_get_resource(dev, IORESOURCE_MEM, 0); >> + intr0 = platform_get_resource(dev, IORESOURCE_IRQ, 0); >> + intr1 = platform_get_resource(dev, IORESOURCE_IRQ, 1); >> + >> + if (!regs || !intr0 || !intr1) { > > I prefer to have these checked for explicitly, one by one for > readability and potentially more useful diagnostics. Can do. > Also you should be using platform_get_irq() for interrupts. Furthermore > the host1x DT node (and the TRM) name the interrupts "syncpt" and > "general", so maybe those would be more useful variable names than > "intr0" and "intr1". > > But since you don't use them anyway they shouldn't be part of this > patch. True. I might also as well delete the general interrupt altogether, as we don't use it for any real purpose. >> + /* Copy host1x parameters. The private_data gets replaced >> + * by nvhost_master later */ > > Multiline comments should be in this format: > > /* > * foo > */ Ok. >> + host->aperture = devm_request_and_ioremap(&dev->dev, regs); >> + if (!host->aperture) { > > aperture is confusing as it is typically used for GTT-type memory > regions, so it may be mistaken for the GART found on Tegra 2. Why not > call it "regs" instead? Can do. > >> + dev_err(&dev->dev, "failed to remap host registers\n"); > > This is unnecessary. devm_request_and_ioremap() already prints an error > message on failure. I'll remove that, thanks. > >> + for (i = 0; i < pdata->num_clks; i++) >> + clk_prepare_enable(pdata->clk[i]); >> + nvhost_syncpt_reset(&host->syncpt); >> + for (i = 0; i < pdata->num_clks; i++) >> + clk_disable_unprepare(pdata->clk[i]); > > Stephen already hinted at this when discussing the AUXDATA. You should > explicitly request the clocks. I'm not too happy about that idea. The clock management code is generic for all modules, and that's why it's driven by a data structure. Now Stephen and you ask me to unroll the loops and make copies of the code to facilitate different modules and different SoCs. >> +static int __exit nvhost_remove(struct platform_device *dev) > > This should really be __devexit to allow the driver to be built as a > module. However, __dev* are deprecated and in the process of being > removed so you can just drop __exit as well. >> +static struct of_device_id host1x_match[] __devinitdata = { > > __devinitdata can be dropped. >> + { .compatible = "nvidia,tegra20-host1x", }, >> + { .compatible = "nvidia,tegra30-host1x", }, >> + { }, >> +}; >> + >> +static struct platform_driver platform_driver = { >> + .probe = nvhost_probe, >> + .remove = __exit_p(nvhost_remove), > > __exit_p also. Ok. >> + .of_match_table = of_match_ptr(host1x_match), > > No need for of_match_ptr(). Will remove. >> +module_init(nvhost_mod_init); >> +module_exit(nvhost_mod_exit); > > Use module_platform_driver(). Ok. > >> diff --git a/drivers/video/tegra/host/host1x/host1x.h b/drivers/video/tegra/host/host1x/host1x.h > [...] >> +#define TRACE_MAX_LENGTH 128U >> +#define IFACE_NAME "nvhost" > > None of these seem to be used. Will remove. >> +static inline void *nvhost_get_private_data(struct platform_device *_dev) >> +{ >> + struct nvhost_device_data *pdata = >> + (struct nvhost_device_data *)platform_get_drvdata(_dev); >> + WARN_ON(!pdata); >> + return (pdata && pdata->private_data) ? pdata->private_data : NULL; >> +} >> + >> +static inline void nvhost_set_private_data(struct platform_device *_dev, >> + void *priv_data) >> +{ >> + struct nvhost_device_data *pdata = >> + (struct nvhost_device_data *)platform_get_drvdata(_dev); >> + WARN_ON(!pdata); >> + if (pdata) >> + pdata->private_data = priv_data; >> +} > > You should need none of these. Instead put all the data you need into > you struct host1x and associate that with the platform device using > platform_set_drvdata(). I need to actually find a way to do this for both host1x itself, and the 2D module. But as said, I'll try to remove the auxdata and come up with something better. >> +static inline >> +struct nvhost_master *nvhost_get_host(struct platform_device *_dev) >> +{ >> + struct platform_device *pdev; >> + >> + if (_dev->dev.parent) { >> + pdev = to_platform_device(_dev->dev.parent); >> + return nvhost_get_private_data(pdev); >> + } else >> + return nvhost_get_private_data(_dev); >> +} >> + >> +static inline >> +struct platform_device *nvhost_get_parent(struct platform_device *_dev) >> +{ >> + return _dev->dev.parent ? to_platform_device(_dev->dev.parent) : NULL; >> +} > > These don't seem to be used. nvhost_get_host() is used in a subsequent patch, but getting parent doesn't seem to be. > Usually you don't keep separate variables for subregions. This can > equally well be done with just adding a corresponding offset. Hmm, ok, I could do that, but it just sounds logical to have only one piece of code that finds the sync reg base. I don't really understand why it needs to be duplicate for every access. >> +static void host1x_syncpt_debug(struct nvhost_syncpt *sp) >> +{ >> + u32 i; >> + for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) { >> + u32 max = nvhost_syncpt_read_max(sp, i); >> + u32 min = nvhost_syncpt_update_min(sp, i); >> + if (!max && !min) >> + continue; >> + dev_info(&syncpt_to_dev(sp)->dev->dev, >> + "id %d (%s) min %d max %d\n", >> + i, syncpt_op().name(sp, i), >> + min, max); >> + >> + } >> + >> + for (i = 0; i < nvhost_syncpt_nb_bases(sp); i++) { >> + u32 base_val; >> + host1x_syncpt_read_wait_base(sp, i); >> + base_val = sp->base_val[i]; >> + if (base_val) >> + dev_info(&syncpt_to_dev(sp)->dev->dev, >> + "waitbase id %d val %d\n", >> + i, base_val); >> + >> + } >> +} > > This should probably be integrated with debugfs. I could move this to debug.c, but it's debugging aid when a command stream is misbehaving and it spews this to UART when sync point wait is timing out. So not debugfs stuff. >> diff --git a/drivers/video/tegra/host/host1x/hw_host1x01_sync.h b/drivers/video/tegra/host/host1x/hw_host1x01_sync.h > > Autogenerated files are generally not acceptable. And I already > mentioned before that you should be using #define instead of static > inline functions for register and bit definitions. What's the root cause for autogenerated files not being acceptable? I'm autogenerating them from definitions I get from hardware, which makes the results reliable. I like static inline because I get the benefit of compiler type checking, and gcov shows me which register definitions have been used in different tests. #defines are always messy and I pretty much hate them. But if the general request is to use #define's, even though I don't agree, I can accommodate. It's simple to write a sed script to do the conversion. > This is a very funky way of creating sysfs attributes. What you probably > want here are device attributes. See Documentation/filesystems/sysfs.txt > and include/linux/sysfs.h. Thanks for the pointers, looks like device attributes could be used. >> +bool nvhost_syncpt_is_expired( >> + struct nvhost_syncpt *sp, >> + u32 id, >> + u32 thresh) >> +{ >> + u32 current_val; >> + u32 future_val; >> + smp_rmb(); > > What do you need a read memory barrier for? I'll test without that. I can't see any valid reason, and I have a couple of other similar problems. >> +/* Displays the current value of the sync point via sysfs */ >> +static ssize_t syncpt_min_show(struct kobject *kobj, >> + struct kobj_attribute *attr, char *buf) >> +{ >> + struct nvhost_syncpt_attr *syncpt_attr = >> + container_of(attr, struct nvhost_syncpt_attr, attr); >> + >> + return snprintf(buf, PAGE_SIZE, "%u", >> + nvhost_syncpt_read(&syncpt_attr->host->syncpt, >> + syncpt_attr->id)); >> +} >> + >> +static ssize_t syncpt_max_show(struct kobject *kobj, >> + struct kobj_attribute *attr, char *buf) >> +{ >> + struct nvhost_syncpt_attr *syncpt_attr = >> + container_of(attr, struct nvhost_syncpt_attr, attr); >> + >> + return snprintf(buf, PAGE_SIZE, "%u", >> + nvhost_syncpt_read_max(&syncpt_attr->host->syncpt, >> + syncpt_attr->id)); >> +} > > This looks like it belongs in debugfs. This is actually the only interface to read the max value to user space, which can be useful for doing some comparisons that take wrapping into account. But we could just add IOCTLs and remove the sysfs entries. >> diff --git a/include/linux/nvhost.h b/include/linux/nvhost.h > [...] >> +struct host1x_device_info { >> + int nb_channels; /* host1x: num channels supported */ >> + int nb_pts; /* host1x: num syncpoints supported */ >> + int nb_bases; /* host1x: num syncpoints supported */ >> + u32 client_managed; /* host1x: client managed syncpts */ >> + int nb_mlocks; /* host1x: number of mlocks */ >> + const char **syncpt_names; /* names of sync points */ >> +}; >> + >> +struct nvhost_device_data { >> + int version; /* ip version number of device */ >> + int id; /* Separates clients of same hw */ >> + int index; /* Hardware channel number */ >> + void __iomem *aperture; /* Iomem mapped to kernel */ >> + >> + u32 syncpts; /* Bitfield of sync points used */ >> + u32 modulemutexes; /* Bit field of module mutexes */ >> + >> + u32 class; /* Device class */ >> + bool serialize; /* Serialize submits in the channel */ >> + >> + int powergate_ids[NVHOST_MODULE_MAX_POWERGATE_IDS]; >> + bool can_powergate; /* True if module can be power gated */ >> + int clockgate_delay;/* Delay before clock gated */ >> + int powergate_delay;/* Delay before power gated */ >> + struct nvhost_clock clocks[NVHOST_MODULE_MAX_CLOCKS];/* Clock names */ >> + >> + struct delayed_work powerstate_down;/* Power state management */ >> + int num_clks; /* Number of clocks opened for dev */ >> + struct clk *clk[NVHOST_MODULE_MAX_CLOCKS]; >> + struct mutex lock; /* Power management lock */ >> + int powerstate; /* Current power state */ >> + int refcount; /* Number of tasks active */ >> + wait_queue_head_t idle_wq; /* Work queue for idle */ >> + >> + struct nvhost_channel *channel; /* Channel assigned for the module */ >> + struct kobject *power_kobj; /* kobj to hold power sysfs entries */ >> + struct nvhost_device_power_attr *power_attrib; /* sysfs attributes */ >> + struct dentry *debugfs; /* debugfs directory */ >> + >> + void *private_data; /* private platform data */ >> + struct platform_device *pdev; /* owner platform_device */ >> + >> + /* Finalize power on. Can be used for context restore. */ >> + void (*finalize_poweron)(struct platform_device *dev); >> + >> + /* Device is busy. */ >> + void (*busy)(struct platform_device *); >> + >> + /* Device is idle. */ >> + void (*idle)(struct platform_device *); >> + >> + /* Device is going to be suspended */ >> + void (*suspend_ndev)(struct platform_device *); >> + >> + /* Device is initialized */ >> + void (*init)(struct platform_device *dev); >> + >> + /* Device is de-initialized. */ >> + void (*deinit)(struct platform_device *dev); >> + >> + /* Preparing for power off. Used for context save. */ >> + int (*prepare_poweroff)(struct platform_device *dev); >> + >> + /* Clock gating callbacks */ >> + int (*prepare_clockoff)(struct platform_device *dev); >> + void (*finalize_clockon)(struct platform_device *dev); >> +}; > > A lot of this can be removed if you use existing infrastructure and > simplify the design a bit. Most of it can probably move into the main > struct host1x to avoid needless indirections that make the code hard to > follow and maintain. Actually, this struct is generic for host1x and host1x clients, so they cannot be moved to host1x. I do also realize that I'm not using them in the patch sets I sent for 2D. Terje
On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström wrote: > On 28.11.2012 23:23, Thierry Reding wrote: > > This could be problematic. Since drivers/video and drivers/gpu/drm are > > separate trees, this would entail a continuous burden on keeping both > > trees synchronized. While I realize that eventually it might be better > > to put the host1x driver in a separate place to accomodate for its use > > by other subsystems, I'm not sure moving it here right away is the best > > approach. > > I understand your point, but I hope also that we'd end up with something > that can be used as basis for the downstream kernel to migrate to > upstream stack. > > The key point here is to make the API between nvhost and tegradrm as > small and robust to changes as possible. I agree. But I also fear that there will be changes eventually and having both go in via different tree requires those trees to be merged in a specific order to avoid breakage should the API change. This will be particularly ugly in linux-next. That's why I explicitly proposed to take this into drivers/gpu/drm/tegra for the time being, until we can be reasonably sure that the API is fixed. Then I'm fine with moving it wherever seems the best fit. Even then there might be the occasional dependency, but they should get fewer and fewer as the code matures. > >> +struct nvhost_syncpt_ops { > >> + void (*reset)(struct nvhost_syncpt *, u32 id); > >> + void (*reset_wait_base)(struct nvhost_syncpt *, u32 id); > >> + void (*read_wait_base)(struct nvhost_syncpt *, u32 id); > >> + u32 (*update_min)(struct nvhost_syncpt *, u32 id); > >> + void (*cpu_incr)(struct nvhost_syncpt *, u32 id); > >> + void (*debug)(struct nvhost_syncpt *); > >> + const char * (*name)(struct nvhost_syncpt *, u32 id); > >> +}; > > > > Why are these even defined as ops structure? Tegra20 and Tegra30 seem to > > be compatible when it comes to handling syncpoints. I thought they would > > even be compatible in all other aspects as well, so why even have this? > > Tegra20 and Tegra30 are compatible, but future chips are not. I was > hoping we would be ready in upstream kernel for future chips. I think we should ignore that problem for now. Generally planning for any possible combination of incompatibilities leads to overgeneralized designs that require precisely these kinds of indirections. Once some documentation for Tegra 40 materializes we can start thinking about how to encapsulate the incompatible code. > >> +#define syncpt_op() (nvhost_get_chip_ops()->syncpt) > > > > You really shouldn't be doing this, but rather use explicit accesses for > > these structures. If you're design doesn't scatter these definitions > > across several files then it isn't difficult to obtain the correct > > pointers and you don't need these "shortcuts". > > Do you mean that I would move the ops to be part of f.ex. nvhost_syncpt > or nvhost_intr structs? Not quite. What I'm saying is that unless there is a reason to encapsulate the functions into an ops structure (for instance because of incompatibilities across chip generations) they shouldn't be pointers in a struct at all. For that matter I don't think you need the nvhost_syncpt and nvhost_intr structures either. > >> +bool host1x_powered(struct platform_device *dev) > >> +{ > > [...] > >> +} > >> +EXPORT_SYMBOL(host1x_powered); > >> + > >> +void host1x_busy(struct platform_device *dev) > >> +{ > > [...] > >> +} > >> +EXPORT_SYMBOL(host1x_busy); > >> + > >> +void host1x_idle(struct platform_device *dev) > >> +{ > > [...] > >> +} > >> +EXPORT_SYMBOL(host1x_idle); > > > > These look like a reimplementation of the runtime power-management > > framework. > > Yes, we at some point tried to move to use runtime PM. The first attempt > was thwarted by runtime PM and system suspend conflicting with each > other. I believe this is pretty much fixed in later versions of kernel. > > Also, the problem was that runtime PM doesn't support multiple power > states. In downstream kernel, we support clock gating and power gating. > When we moved to runtime PM and implemented power gating on top of that, > we ended up with more code than just using the current ACM code. > > I have a developer starting to look into how we could take runtime PM > again into use with proper power gating support. It'll take a while to > get that right. It might be best to rip the dynamic power management out > from this patch set, and introduce it later when we have a proper > runtime PM solution. Okay, sounds like a plan. Even if it turns out that the current runtime PM implementation doesn't provide every functionality that you need, we should try to get these changes into the existing frameworks instead of copying large chunks of code. > >> +static void nvhost_free_resources(struct nvhost_master *host) > >> +{ > >> +} > > > > This should be removed since it's empty. > > True. I wonder how that happened - there was content since recently, but > I guess I deleted the code without noticing that the function needs to > go, too. I noticed that it was filled with content in one of the subsequent patches. Depending on how this gets merged eventually you could postpone adding the function until the later patch. But perhaps once the code has been properly reviewed we can just squash the patches again. We'll see. > > Also you should be using platform_get_irq() for interrupts. Furthermore > > the host1x DT node (and the TRM) name the interrupts "syncpt" and > > "general", so maybe those would be more useful variable names than > > "intr0" and "intr1". > > > > But since you don't use them anyway they shouldn't be part of this > > patch. > > True. I might also as well delete the general interrupt altogether, as > we don't use it for any real purpose. I think it might still be useful for diagnostics. It seems to be used when writes time out. That could still be helpful information when debugging problems. > >> + for (i = 0; i < pdata->num_clks; i++) > >> + clk_prepare_enable(pdata->clk[i]); > >> + nvhost_syncpt_reset(&host->syncpt); > >> + for (i = 0; i < pdata->num_clks; i++) > >> + clk_disable_unprepare(pdata->clk[i]); > > > > Stephen already hinted at this when discussing the AUXDATA. You should > > explicitly request the clocks. > > I'm not too happy about that idea. The clock management code is generic > for all modules, and that's why it's driven by a data structure. Now > Stephen and you ask me to unroll the loops and make copies of the code > to facilitate different modules and different SoCs. Making this generic for all modules may not be what we want as it doesn't allow devices to handle things themselves if necessary. Clock management is just part of the boiler plate that every driver is supposed to cope with. Also the number of clocks is usually not higher than 2 or 3, so the pain is manageable. =) Furthermore doing this in loops may not work for all modules. Some may require additional delays between enabling the clocks, others may be able to selectively disable one clock but not the other(s). > >> +static inline void *nvhost_get_private_data(struct platform_device *_dev) > >> +{ > >> + struct nvhost_device_data *pdata = > >> + (struct nvhost_device_data *)platform_get_drvdata(_dev); > >> + WARN_ON(!pdata); > >> + return (pdata && pdata->private_data) ? pdata->private_data : NULL; > >> +} > >> + > >> +static inline void nvhost_set_private_data(struct platform_device *_dev, > >> + void *priv_data) > >> +{ > >> + struct nvhost_device_data *pdata = > >> + (struct nvhost_device_data *)platform_get_drvdata(_dev); > >> + WARN_ON(!pdata); > >> + if (pdata) > >> + pdata->private_data = priv_data; > >> +} > > > > You should need none of these. Instead put all the data you need into > > you struct host1x and associate that with the platform device using > > platform_set_drvdata(). > > I need to actually find a way to do this for both host1x itself, and the > 2D module. But as said, I'll try to remove the auxdata and come up with > something better. The existing host1x and DRM code could serve as an example since I explicitly wrote them to behave properly. > >> +static inline > >> +struct nvhost_master *nvhost_get_host(struct platform_device *_dev) > >> +{ > >> + struct platform_device *pdev; > >> + > >> + if (_dev->dev.parent) { > >> + pdev = to_platform_device(_dev->dev.parent); > >> + return nvhost_get_private_data(pdev); > >> + } else > >> + return nvhost_get_private_data(_dev); > >> +} > >> + > >> +static inline > >> +struct platform_device *nvhost_get_parent(struct platform_device *_dev) > >> +{ > >> + return _dev->dev.parent ? to_platform_device(_dev->dev.parent) : NULL; > >> +} > > > > These don't seem to be used. > > nvhost_get_host() is used in a subsequent patch, but getting parent > doesn't seem to be. Again, if you look at the existing tegra-drm code, the client modules already use something a bit more explicit to obtain a reference to the host1x: struct host1x *host1x = dev_get_drvdata(pdev->dev.parent); The good thing about it is that it very clearly says where the host1x pointer should be coming from. Explicitness is good. > > Usually you don't keep separate variables for subregions. This can > > equally well be done with just adding a corresponding offset. > > Hmm, ok, I could do that, but it just sounds logical to have only one > piece of code that finds the sync reg base. I don't really understand > why it needs to be duplicate for every access. You wouldn't actually be duplicating it. Rather you'd just add another offset. But I commented on this more explicitly in a reply to one of the other patches. > >> +static void host1x_syncpt_debug(struct nvhost_syncpt *sp) > >> +{ > >> + u32 i; > >> + for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) { > >> + u32 max = nvhost_syncpt_read_max(sp, i); > >> + u32 min = nvhost_syncpt_update_min(sp, i); > >> + if (!max && !min) > >> + continue; > >> + dev_info(&syncpt_to_dev(sp)->dev->dev, > >> + "id %d (%s) min %d max %d\n", > >> + i, syncpt_op().name(sp, i), > >> + min, max); > >> + > >> + } > >> + > >> + for (i = 0; i < nvhost_syncpt_nb_bases(sp); i++) { > >> + u32 base_val; > >> + host1x_syncpt_read_wait_base(sp, i); > >> + base_val = sp->base_val[i]; > >> + if (base_val) > >> + dev_info(&syncpt_to_dev(sp)->dev->dev, > >> + "waitbase id %d val %d\n", > >> + i, base_val); > >> + > >> + } > >> +} > > > > This should probably be integrated with debugfs. > > I could move this to debug.c, but it's debugging aid when a command > stream is misbehaving and it spews this to UART when sync point wait is > timing out. So not debugfs stuff. Okay, in that case it should stay in. Perhaps convert dev_info() to dev_dbg(). Perhaps wrapping it in some #ifdef CONFIG_TEGRA_HOST1X_DEBUG guards would also be useful. Maybe not. > >> diff --git a/drivers/video/tegra/host/host1x/hw_host1x01_sync.h b/drivers/video/tegra/host/host1x/hw_host1x01_sync.h > > > > Autogenerated files are generally not acceptable. And I already > > mentioned before that you should be using #define instead of static > > inline functions for register and bit definitions. > > What's the root cause for autogenerated files not being acceptable? I'm > autogenerating them from definitions I get from hardware, which makes > the results reliable. The problem is not with autogenerated files in general. The means by which they are generated are less important. However, autogenerated files often contain a lot of unneeded definitions and contain things such as "autogenerated - do not edit" lines. So generally if you generate the content using some scripts to make sure it corresponds to what engineering gave you, that's okay as long as you make sure it has the correct form and doesn't contain any cruft. > I like static inline because I get the benefit of compiler type > checking, and gcov shows me which register definitions have been used in > different tests. Type checking shouldn't be necessary for simple defines. And I wasn't aware that you could get the Linux kernel to write out data to be fed to gcov. > #defines are always messy and I pretty much hate them. But if the > general request is to use #define's, even though I don't agree, I can > accommodate. It's simple to write a sed script to do the conversion. There are a lot of opportunities to abuse #defines but they are harmless for register definitions. The Linux kernel is full of them and I haven't yet seen any code that uses static inline functions for this purpose. What you need to consider as well is that many people that work with the Linux kernel expect code to be in a certain style. Register accesses of the form writel(value, base + OFFSET); are very common and expected to look a certain way, so if you write code that doesn't comply with these guidelines you make it extra hard for people to read the code. And that'll cost extra time, which people don't usually have in excess. > >> +/* Displays the current value of the sync point via sysfs */ > >> +static ssize_t syncpt_min_show(struct kobject *kobj, > >> + struct kobj_attribute *attr, char *buf) > >> +{ > >> + struct nvhost_syncpt_attr *syncpt_attr = > >> + container_of(attr, struct nvhost_syncpt_attr, attr); > >> + > >> + return snprintf(buf, PAGE_SIZE, "%u", > >> + nvhost_syncpt_read(&syncpt_attr->host->syncpt, > >> + syncpt_attr->id)); > >> +} > >> + > >> +static ssize_t syncpt_max_show(struct kobject *kobj, > >> + struct kobj_attribute *attr, char *buf) > >> +{ > >> + struct nvhost_syncpt_attr *syncpt_attr = > >> + container_of(attr, struct nvhost_syncpt_attr, attr); > >> + > >> + return snprintf(buf, PAGE_SIZE, "%u", > >> + nvhost_syncpt_read_max(&syncpt_attr->host->syncpt, > >> + syncpt_attr->id)); > >> +} > > > > This looks like it belongs in debugfs. > > This is actually the only interface to read the max value to user space, > which can be useful for doing some comparisons that take wrapping into > account. But we could just add IOCTLs and remove the sysfs entries. Maybe you can explain the usefulness of this some more. Why would it be easier to look at them in sysfs than in debugfs? You could be providing a simple list of syncpoints along with min/max, name, requested status, etc. in debugfs and it should be as easy to parse for both humans and machines as sysfs. I don't think IOCTLs would be any gain as they tend to have higher ABI stability requirements than debugfs (which doesn't have very strong requirements) or sysfs (which is often considered as a public ABI as well and therefore needs to be stable). > >> diff --git a/include/linux/nvhost.h b/include/linux/nvhost.h > > [...] > >> +struct host1x_device_info { > >> + int nb_channels; /* host1x: num channels supported */ > >> + int nb_pts; /* host1x: num syncpoints supported */ > >> + int nb_bases; /* host1x: num syncpoints supported */ > >> + u32 client_managed; /* host1x: client managed syncpts */ > >> + int nb_mlocks; /* host1x: number of mlocks */ > >> + const char **syncpt_names; /* names of sync points */ > >> +}; > >> + > >> +struct nvhost_device_data { > >> + int version; /* ip version number of device */ > >> + int id; /* Separates clients of same hw */ > >> + int index; /* Hardware channel number */ > >> + void __iomem *aperture; /* Iomem mapped to kernel */ > >> + > >> + u32 syncpts; /* Bitfield of sync points used */ > >> + u32 modulemutexes; /* Bit field of module mutexes */ > >> + > >> + u32 class; /* Device class */ > >> + bool serialize; /* Serialize submits in the channel */ > >> + > >> + int powergate_ids[NVHOST_MODULE_MAX_POWERGATE_IDS]; > >> + bool can_powergate; /* True if module can be power gated */ > >> + int clockgate_delay;/* Delay before clock gated */ > >> + int powergate_delay;/* Delay before power gated */ > >> + struct nvhost_clock clocks[NVHOST_MODULE_MAX_CLOCKS];/* Clock names */ > >> + > >> + struct delayed_work powerstate_down;/* Power state management */ > >> + int num_clks; /* Number of clocks opened for dev */ > >> + struct clk *clk[NVHOST_MODULE_MAX_CLOCKS]; > >> + struct mutex lock; /* Power management lock */ > >> + int powerstate; /* Current power state */ > >> + int refcount; /* Number of tasks active */ > >> + wait_queue_head_t idle_wq; /* Work queue for idle */ > >> + > >> + struct nvhost_channel *channel; /* Channel assigned for the module */ > >> + struct kobject *power_kobj; /* kobj to hold power sysfs entries */ > >> + struct nvhost_device_power_attr *power_attrib; /* sysfs attributes */ > >> + struct dentry *debugfs; /* debugfs directory */ > >> + > >> + void *private_data; /* private platform data */ > >> + struct platform_device *pdev; /* owner platform_device */ > >> + > >> + /* Finalize power on. Can be used for context restore. */ > >> + void (*finalize_poweron)(struct platform_device *dev); > >> + > >> + /* Device is busy. */ > >> + void (*busy)(struct platform_device *); > >> + > >> + /* Device is idle. */ > >> + void (*idle)(struct platform_device *); > >> + > >> + /* Device is going to be suspended */ > >> + void (*suspend_ndev)(struct platform_device *); > >> + > >> + /* Device is initialized */ > >> + void (*init)(struct platform_device *dev); > >> + > >> + /* Device is de-initialized. */ > >> + void (*deinit)(struct platform_device *dev); > >> + > >> + /* Preparing for power off. Used for context save. */ > >> + int (*prepare_poweroff)(struct platform_device *dev); > >> + > >> + /* Clock gating callbacks */ > >> + int (*prepare_clockoff)(struct platform_device *dev); > >> + void (*finalize_clockon)(struct platform_device *dev); > >> +}; > > > > A lot of this can be removed if you use existing infrastructure and > > simplify the design a bit. Most of it can probably move into the main > > struct host1x to avoid needless indirections that make the code hard to > > follow and maintain. > > Actually, this struct is generic for host1x and host1x clients, so they > cannot be moved to host1x. I do also realize that I'm not using them in > the patch sets I sent for 2D. I've said this before, and I think that this tries to be overly generic. Display controllers for instance work quite well without an attached nvhost_channel. Thierry
On 11/29/2012 03:21 AM, Terje Bergström wrote: > On 28.11.2012 23:23, Thierry Reding wrote: ... >>> + regs = platform_get_resource(dev, IORESOURCE_MEM, 0); >>> + intr0 = platform_get_resource(dev, IORESOURCE_IRQ, 0); >>> + intr1 = platform_get_resource(dev, IORESOURCE_IRQ, 1); >>> + >>> + if (!regs || !intr0 || !intr1) { >> >> I prefer to have these checked for explicitly, one by one for >> readability and potentially more useful diagnostics. > > Can do. > >> Also you should be using platform_get_irq() for interrupts. Furthermore >> the host1x DT node (and the TRM) name the interrupts "syncpt" and >> "general", so maybe those would be more useful variable names than >> "intr0" and "intr1". >> >> But since you don't use them anyway they shouldn't be part of this >> patch. > > True. I might also as well delete the general interrupt altogether, as > we don't use it for any real purpose. Do make sure the interrupts still are part of the DT binding though, so that the binding fully describes the HW, and the interrupt is available to retrieve if we ever do use it in the future. >>> + for (i = 0; i < pdata->num_clks; i++) >>> + clk_prepare_enable(pdata->clk[i]); >>> + nvhost_syncpt_reset(&host->syncpt); >>> + for (i = 0; i < pdata->num_clks; i++) >>> + clk_disable_unprepare(pdata->clk[i]); >> >> Stephen already hinted at this when discussing the AUXDATA. You should >> explicitly request the clocks. > > I'm not too happy about that idea. The clock management code is generic > for all modules, and that's why it's driven by a data structure. Now > Stephen and you ask me to unroll the loops and make copies of the code > to facilitate different modules and different SoCs. You can still create tables of clocks inside the driver and loop over them. So, loop unrolling isn't related to my comments at least. It's just that clk_get() shouldn't take its parameters from platform data. But if these are clocks for (arbitrary) child modules (that may or may not exist dynamically), why aren't the drivers for the child modules managing them?
On 11/29/2012 04:47 AM, Thierry Reding wrote: > On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström wrote: >> On 28.11.2012 23:23, Thierry Reding wrote: >>> This could be problematic. Since drivers/video and >>> drivers/gpu/drm are separate trees, this would entail a >>> continuous burden on keeping both trees synchronized. While I >>> realize that eventually it might be better to put the host1x >>> driver in a separate place to accomodate for its use by other >>> subsystems, I'm not sure moving it here right away is the best >>> approach. >> >> I understand your point, but I hope also that we'd end up with >> something that can be used as basis for the downstream kernel to >> migrate to upstream stack. >> >> The key point here is to make the API between nvhost and tegradrm >> as small and robust to changes as possible. > > I agree. But I also fear that there will be changes eventually and > having both go in via different tree requires those trees to be > merged in a specific order to avoid breakage should the API change. > This will be particularly ugly in linux-next. > > That's why I explicitly proposed to take this into > drivers/gpu/drm/tegra for the time being, until we can be > reasonably sure that the API is fixed. Then I'm fine with moving it > wherever seems the best fit. Even then there might be the > occasional dependency, but they should get fewer and fewer as the > code matures. It is acceptable for one maintainer to ack patches, and another maintainer to merge a series that touches both "their own" code and code owned by another tree. This should of course only be needed when inter-module APIs change; changes to code within a module shouldn't require this.
On Thu, Nov 29, 2012 at 11:38:11AM -0700, Stephen Warren wrote: > On 11/29/2012 04:47 AM, Thierry Reding wrote: > > On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström wrote: > >> On 28.11.2012 23:23, Thierry Reding wrote: > >>> This could be problematic. Since drivers/video and > >>> drivers/gpu/drm are separate trees, this would entail a > >>> continuous burden on keeping both trees synchronized. While I > >>> realize that eventually it might be better to put the host1x > >>> driver in a separate place to accomodate for its use by other > >>> subsystems, I'm not sure moving it here right away is the best > >>> approach. > >> > >> I understand your point, but I hope also that we'd end up with > >> something that can be used as basis for the downstream kernel to > >> migrate to upstream stack. > >> > >> The key point here is to make the API between nvhost and tegradrm > >> as small and robust to changes as possible. > > > > I agree. But I also fear that there will be changes eventually and > > having both go in via different tree requires those trees to be > > merged in a specific order to avoid breakage should the API change. > > This will be particularly ugly in linux-next. > > > > That's why I explicitly proposed to take this into > > drivers/gpu/drm/tegra for the time being, until we can be > > reasonably sure that the API is fixed. Then I'm fine with moving it > > wherever seems the best fit. Even then there might be the > > occasional dependency, but they should get fewer and fewer as the > > code matures. > > It is acceptable for one maintainer to ack patches, and another > maintainer to merge a series that touches both "their own" code and > code owned by another tree. This should of course only be needed when > inter-module APIs change; changes to code within a module shouldn't > require this. Yes, that's true. But it still makes things more complicated since each of the maintainers will have to do extra work to test the changes. Anyway we'll see how this plays out. The ideal case would of course be to get the API right from the start. =) Thierry
On Fri, Nov 30, 2012 at 08:54:32AM +0200, Terje Bergström wrote: > On 29.11.2012 20:34, Stephen Warren wrote: > > On 11/29/2012 03:21 AM, Terje Bergström wrote: > >> True. I might also as well delete the general interrupt altogether, as > >> we don't use it for any real purpose. > > > > Do make sure the interrupts still are part of the DT binding though, so > > that the binding fully describes the HW, and the interrupt is available > > to retrieve if we ever do use it in the future. > > Sure, I will just not use the generic irq in DT, but it won't require > any changes in DT bindings. > > > You can still create tables of clocks inside the driver and loop over > > them. So, loop unrolling isn't related to my comments at least. It's > > just that clk_get() shouldn't take its parameters from platform data. > > > > But if these are clocks for (arbitrary) child modules (that may or may > > not exist dynamically), why aren't the drivers for the child modules > > managing them? > > There are actually two things here that I mixed, and because of that I > probably confused everybody else. > > Let's rip out the ACM. ACM is generic to all modules, and in nvhost owns > the clocks. That's why list of clocks and their frequency policies have > been part of the device description in nvhost. ACM is being replaced > with runtime PM in downstream kernel, but it still requires rigorous > testing and analysis of power profile before we can move to it. > > Then, the second thing is that nvhost_probe() has had its own loop to go > through the clocks of host1x module. It's copy-paste of what ACM did, > which is just bad design. That's easily replaceable with static code, as > nvhost_probe() is just for host1x. I'll do that, and as I rip out the > generic power management code, I'll also make 2D and host1x drivers > enable the clocks at probe with static code. > > So I think we have a solution that resonates with all proposals. Yes, that sounds good to me. Thierry
On 29.11.2012 20:34, Stephen Warren wrote: > On 11/29/2012 03:21 AM, Terje Bergström wrote: >> True. I might also as well delete the general interrupt altogether, as >> we don't use it for any real purpose. > > Do make sure the interrupts still are part of the DT binding though, so > that the binding fully describes the HW, and the interrupt is available > to retrieve if we ever do use it in the future. Sure, I will just not use the generic irq in DT, but it won't require any changes in DT bindings. > You can still create tables of clocks inside the driver and loop over > them. So, loop unrolling isn't related to my comments at least. It's > just that clk_get() shouldn't take its parameters from platform data. > > But if these are clocks for (arbitrary) child modules (that may or may > not exist dynamically), why aren't the drivers for the child modules > managing them? There are actually two things here that I mixed, and because of that I probably confused everybody else. Let's rip out the ACM. ACM is generic to all modules, and in nvhost owns the clocks. That's why list of clocks and their frequency policies have been part of the device description in nvhost. ACM is being replaced with runtime PM in downstream kernel, but it still requires rigorous testing and analysis of power profile before we can move to it. Then, the second thing is that nvhost_probe() has had its own loop to go through the clocks of host1x module. It's copy-paste of what ACM did, which is just bad design. That's easily replaceable with static code, as nvhost_probe() is just for host1x. I'll do that, and as I rip out the generic power management code, I'll also make 2D and host1x drivers enable the clocks at probe with static code. So I think we have a solution that resonates with all proposals. Best regards, Terje
Am Donnerstag, den 29.11.2012, 11:38 -0700 schrieb Stephen Warren: > On 11/29/2012 04:47 AM, Thierry Reding wrote: > > I agree. But I also fear that there will be changes eventually and > > having both go in via different tree requires those trees to be > > merged in a specific order to avoid breakage should the API change. > > This will be particularly ugly in linux-next. > > > > That's why I explicitly proposed to take this into > > drivers/gpu/drm/tegra for the time being, until we can be > > reasonably sure that the API is fixed. Then I'm fine with moving it > > wherever seems the best fit. Even then there might be the > > occasional dependency, but they should get fewer and fewer as the > > code matures. > > It is acceptable for one maintainer to ack patches, and another > maintainer to merge a series that touches both "their own" code and > code owned by another tree. This should of course only be needed when > inter-module APIs change; changes to code within a module shouldn't > require this. > I'm with Thierry here. I think there is a fair chance that we won't get the API right from the start, even when trying to come up with something that sounds sane to everyone. It's also not desirable to delay gr2d going into mainline until we are all completely satisfied with the API. I also fail to see how host1x module being in the DRM directory hinders any downstream development. So I'm in favour of keeping host1x besides the other DRM components to lower the burden for API changes and move it out into some more generic directory, once we feel confident that the API is reasonable stable. Regards, Lucas
On 29.11.2012 13:47, Thierry Reding wrote: > On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström wrote: >> Tegra20 and Tegra30 are compatible, but future chips are not. I was >> hoping we would be ready in upstream kernel for future chips. > > I think we should ignore that problem for now. Generally planning for > any possible combination of incompatibilities leads to overgeneralized > designs that require precisely these kinds of indirections. > > Once some documentation for Tegra 40 materializes we can start thinking > about how to encapsulate the incompatible code. I think here our perspectives differ a lot. That is natural considering the company I work for and company you work for, so let's try to sync the perspective. In my reality, whatever is in market is old news and I barely work on them anymore. Upstreaming activity is the exception. 90% of my time is spent dealing with future chips which I know cannot be handled without this split to logical and physical driver parts. For you, Tegra2 and Tegra3 are the reality. If we move nvhost in upstream a bit incompatible, that's fine, like ripping out features or adding new new stuff, like a new memory type. All of this I can support with a good diff tool to get all the patches flowing between upstream and downstream. If we do fundamental changes that prevent bringing the code back to downstream, like removing this abstraction, the whole process of upstream and downstream converging hits a brick wall. We wouldn't have proper continuing co-operation, but just pushing code out and being done with it. > I noticed that it was filled with content in one of the subsequent > patches. Depending on how this gets merged eventually you could postpone > adding the function until the later patch. But perhaps once the code has > been properly reviewed we can just squash the patches again. We'll see. Ok, thanks. >> True. I might also as well delete the general interrupt altogether, as >> we don't use it for any real purpose. > > I think it might still be useful for diagnostics. It seems to be used > when writes time out. That could still be helpful information when > debugging problems. It's actually a stale comment. The client units are not signaling anything useful with the interrupt. There's use for it in downstream, but that's irrelevant here. > Making this generic for all modules may not be what we want as it > doesn't allow devices to handle things themselves if necessary. Clock > management is just part of the boiler plate that every driver is > supposed to cope with. Also the number of clocks is usually not higher > than 2 or 3, so the pain is manageable. =) > > Furthermore doing this in loops may not work for all modules. Some may > require additional delays between enabling the clocks, others may be > able to selectively disable one clock but not the other(s). Yes, but I'll just rip the power management code out, so we can postpone this until we have validated and verified the runtime PM mechanism downstream. >> I could move this to debug.c, but it's debugging aid when a command >> stream is misbehaving and it spews this to UART when sync point wait is >> timing out. So not debugfs stuff. > > Okay, in that case it should stay in. Perhaps convert dev_info() to > dev_dbg(). Perhaps wrapping it in some #ifdef CONFIG_TEGRA_HOST1X_DEBUG > guards would also be useful. Maybe not. I could do that for upstream. In downstream it cannot depend on DEBUG flag, as these spews are an important part of how we debug problems with customer devices and the DEBUG flag is never on in customer builds. > The problem is not with autogenerated files in general. The means by > which they are generated are less important. However, autogenerated > files often contain a lot of unneeded definitions and contain things > such as "autogenerated - do not edit" lines. > > So generally if you generate the content using some scripts to make sure > it corresponds to what engineering gave you, that's okay as long as you > make sure it has the correct form and doesn't contain any cruft. I can remove the boilerplate, that's not a problem. In general, we have tried to be very selective about what we generate, so that it matches what we're using. >> I like static inline because I get the benefit of compiler type >> checking, and gcov shows me which register definitions have been used in >> different tests. > > Type checking shouldn't be necessary for simple defines. And I wasn't > aware that you could get the Linux kernel to write out data to be fed to > gcov. > >> #defines are always messy and I pretty much hate them. But if the >> general request is to use #define's, even though I don't agree, I can >> accommodate. It's simple to write a sed script to do the conversion. > > There are a lot of opportunities to abuse #defines but they are harmless > for register definitions. The Linux kernel is full of them and I haven't > yet seen any code that uses static inline functions for this purpose. My problem is just that I know that the code generated is the same. What we're talking about is that should we let the preprocessor or compiler take care of this. My take is that using preprocessor is not wise - it's the last resort if there's no other proper way of doing things. Preprocessor requires all sorts of extra parenthesis to protect against its deficiencies, and it it merely a tool to do search-and-replace. Even multi-line needs special treatment. > What you need to consider as well is that many people that work with the > Linux kernel expect code to be in a certain style. Register accesses of > the form > > writel(value, base + OFFSET); > > are very common and expected to look a certain way, so if you write code > that doesn't comply with these guidelines you make it extra hard for > people to read the code. And that'll cost extra time, which people don't > usually have in excess. But this has nothing to do with static inline vs. #define anymore, right? > Maybe you can explain the usefulness of this some more. Why would it be > easier to look at them in sysfs than in debugfs? You could be providing > a simple list of syncpoints along with min/max, name, requested status, > etc. in debugfs and it should be as easy to parse for both humans and > machines as sysfs. I don't think IOCTLs would be any gain as they tend > to have higher ABI stability requirements than debugfs (which doesn't > have very strong requirements) or sysfs (which is often considered as a > public ABI as well and therefore needs to be stable). debugfs is just a debugging tool, and user space cannot rely on it. Only developers can rely on existence of debugfs, as they have the means to enable it. sysfs is a place for actual APIs as you mention, and user space can rely on them as proper APIs. That's what the values were exported for. > I've said this before, and I think that this tries to be overly generic. > Display controllers for instance work quite well without an attached > nvhost_channel. Yes, these structures aren't meant to be used by anything else than units that are controlled by the host1x driver. DC, for example, wouldn't have this. Terje
On Fri, Nov 30, 2012 at 10:56:39AM +0200, Terje Bergström wrote: > On 29.11.2012 13:47, Thierry Reding wrote: > > On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström wrote: > >> Tegra20 and Tegra30 are compatible, but future chips are not. I was > >> hoping we would be ready in upstream kernel for future chips. > > > > I think we should ignore that problem for now. Generally planning for > > any possible combination of incompatibilities leads to overgeneralized > > designs that require precisely these kinds of indirections. > > > > Once some documentation for Tegra 40 materializes we can start thinking > > about how to encapsulate the incompatible code. > > I think here our perspectives differ a lot. That is natural considering > the company I work for and company you work for, so let's try to sync > the perspective. > > In my reality, whatever is in market is old news and I barely work on > them anymore. Upstreaming activity is the exception. 90% of my time is > spent dealing with future chips which I know cannot be handled without > this split to logical and physical driver parts. > > For you, Tegra2 and Tegra3 are the reality. To be fair, Tegra2 and Tegra3 are the reality for *everyone* *outside* NVIDIA. It's great that you spend most of your time working with future chips, but unless you submit the code for inclusion or review nobody upstream needs to be concerned about the implications. Most people don't have time to waste so we naturally try to keep the maintenance burden to a minimum. The above implies that when you submit code it shouldn't contain pieces that prepare for possible future extensions which may or may not be submitted (the exception being if such changes are part of a series where subsequent patches actually use them). The outcome is that the amount of cruft in the mainline kernel is kept to a minimum. And that's a very good thing. > If we move nvhost in upstream a bit incompatible, that's fine, like > ripping out features or adding new new stuff, like a new memory type. > All of this I can support with a good diff tool to get all the patches > flowing between upstream and downstream. > > If we do fundamental changes that prevent bringing the code back to > downstream, like removing this abstraction, the whole process of > upstream and downstream converging hits a brick wall. We wouldn't have > proper continuing co-operation, but just pushing code out and being done > with it. Generally upstream doesn't concern itself with downstream. However we still willingly accept code that is submitted for upstream inclusion independent of where it comes from. The only requirements are that the code conforms to the established standards and has gone through an appropriate amount of review. Downstream maintenance is up to you. If you need to maintain code that doesn't meet the above requirements or that you don't want to submit or haven't got around to yet that's your problem. If you're serious about wanting to derive your downstream kernel from a mainline kernel, then the only realistic way for you to reduce your amount of work is to push your code upstream. And typically the earlier you do so, the better. > >> I could move this to debug.c, but it's debugging aid when a command > >> stream is misbehaving and it spews this to UART when sync point wait is > >> timing out. So not debugfs stuff. > > > > Okay, in that case it should stay in. Perhaps convert dev_info() to > > dev_dbg(). Perhaps wrapping it in some #ifdef CONFIG_TEGRA_HOST1X_DEBUG > > guards would also be useful. Maybe not. > > I could do that for upstream. In downstream it cannot depend on DEBUG > flag, as these spews are an important part of how we debug problems with > customer devices and the DEBUG flag is never on in customer builds. So I've just looked through these patches once more and I can't find where this functionality is actually used. The host1x_syncpt_debug() function is assigned to the nvhost_syncpt_ops.debug member, which in turn is only used by nvhost_syncpt_debug(). The latter, however is never used (not even by the debug infrastructure introduced in patch 4). > >> I like static inline because I get the benefit of compiler type > >> checking, and gcov shows me which register definitions have been used in > >> different tests. > > > > Type checking shouldn't be necessary for simple defines. And I wasn't > > aware that you could get the Linux kernel to write out data to be fed to > > gcov. > > > >> #defines are always messy and I pretty much hate them. But if the > >> general request is to use #define's, even though I don't agree, I can > >> accommodate. It's simple to write a sed script to do the conversion. > > > > There are a lot of opportunities to abuse #defines but they are harmless > > for register definitions. The Linux kernel is full of them and I haven't > > yet seen any code that uses static inline functions for this purpose. > > My problem is just that I know that the code generated is the same. What > we're talking about is that should we let the preprocessor or compiler > take care of this. > > My take is that using preprocessor is not wise - it's the last resort if > there's no other proper way of doing things. Preprocessor requires all > sorts of extra parenthesis to protect against its deficiencies, and it > it merely a tool to do search-and-replace. Even multi-line needs special > treatment. Okay, so what you're saying here is that a huge number of people haven't been wise in using the preprocessor for register definitions all these years. That's a pretty bold statement. Now I obviously haven't looked at every single line in the kernel, but I have never come across this usage for static inline functions used for this. So, to be honest, I don't think this is really up for discussion. Of course if you come up with an example where this is done in a similar way I could be persuaded otherwise. > > What you need to consider as well is that many people that work with the > > Linux kernel expect code to be in a certain style. Register accesses of > > the form > > > > writel(value, base + OFFSET); > > > > are very common and expected to look a certain way, so if you write code > > that doesn't comply with these guidelines you make it extra hard for > > people to read the code. And that'll cost extra time, which people don't > > usually have in excess. > > But this has nothing to do with static inline vs. #define anymore, right? Of course it has. With the way you've chosen to define registers the code will look like this: writel(value, base + offset_r()) Maybe it's just me, but when I read code like that I need additional time to parse it as opposed to the canonical form. > > Maybe you can explain the usefulness of this some more. Why would it be > > easier to look at them in sysfs than in debugfs? You could be providing > > a simple list of syncpoints along with min/max, name, requested status, > > etc. in debugfs and it should be as easy to parse for both humans and > > machines as sysfs. I don't think IOCTLs would be any gain as they tend > > to have higher ABI stability requirements than debugfs (which doesn't > > have very strong requirements) or sysfs (which is often considered as a > > public ABI as well and therefore needs to be stable). > > debugfs is just a debugging tool, and user space cannot rely on it. Only > developers can rely on existence of debugfs, as they have the means to > enable it. > > sysfs is a place for actual APIs as you mention, and user space can rely > on them as proper APIs. That's what the values were exported for. But I don't see how that's relevant here. Let me quote what you said originally: > This is actually the only interface to read the max value to user space, > which can be useful for doing some comparisons that take wrapping into > account. But we could just add IOCTLs and remove the sysfs entries. To me that sounded like it was only used for debugging purposes. If you actually need to access this from a userspace driver then, as opposed to what I said earlier, this should be handled by some IOCTL. Thierry
On 30.11.2012 12:38, Thierry Reding wrote: > * PGP Signed by an unknown key > The above implies that when you submit code it shouldn't contain pieces > that prepare for possible future extensions which may or may not be > submitted (the exception being if such changes are part of a series > where subsequent patches actually use them). The outcome is that the > amount of cruft in the mainline kernel is kept to a minimum. And that's > a very good thing. We're now talking about actually a separation of logical and physical driver. I can't see why that's a bad thing. Especially considering that it's standard practice in well written drivers. Let's try to find a technical clean solution instead of debating politics. The latter should never be part of Linux kernel reviews. >> I could do that for upstream. In downstream it cannot depend on DEBUG >> flag, as these spews are an important part of how we debug problems with >> customer devices and the DEBUG flag is never on in customer builds. > > So I've just looked through these patches once more and I can't find > where this functionality is actually used. The host1x_syncpt_debug() > function is assigned to the nvhost_syncpt_ops.debug member, which in > turn is only used by nvhost_syncpt_debug(). The latter, however is > never used (not even by the debug infrastructure introduced in patch > 4). I have accidentally used the syncpt_op().debug() version directly. I'll fix that. > Okay, so what you're sayingCan here is that a huge number of people haven't > been wise in using the preprocessor for register definitions all these > years. That's a pretty bold statement. Now I obviously haven't looked at > every single line in the kernel, but I have never come across this usage > for static inline functions used for this. So, to be honest, I don't > think this is really up for discussion. Of course if you come up with an > example where this is done in a similar way I could be persuaded > otherwise. We must've talked about a bit different things. For pure register defs, I can accommodate changing to #defines. We'd lose the code coverage analysis, though, but if the parentheses are a make-or-break question to upstreaming, I can change. I was thinking of definitions like this: static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v) { return (v & 0x1ff) << 0; } versus #define host1x_sync_cfpeek_ctrl_cfpeek_addr_f(v) ((v) >> 16) & 0x3ff Both of these produce the same machine code and have same usage, but the latter has type checking and code coverage analysis and the former is (in my eyes) clearer. In both of these cases the usage is like this: writel(host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1) | host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid) | host1x_sync_cfpeek_ctrl_cfpeek_addr_f(rd_ptr), m->sync_aperture + host1x_sync_cfpeek_ctrl_r()); > But I don't see how that's relevant here. Let me quote what you said > originally: > >> This is actually the only interface to read the max value to user space, >> which can be useful for doing some comparisons that take wrapping into >> account. But we could just add IOCTLs and remove the sysfs entries. > > To me that sounded like it was only used for debugging purposes. If you > actually need to access this from a userspace driver then, as opposed to > what I said earlier, this should be handled by some IOCTL. There's a use for production code to know both the max and min, but I think we can just scope that use out from this patch sest. User space can use these two for checking if one of their fences has already passed by comparing if the current value is between min and fence, taking wrapping into account. In these cases user space can f.ex. leave a host1x wait out from a command stream. Terje
On 30.11.2012 10:50, Lucas Stach wrote: > I'm with Thierry here. I think there is a fair chance that we won't get > the API right from the start, even when trying to come up with something > that sounds sane to everyone. It's also not desirable to delay gr2d > going into mainline until we are all completely satisfied with the API. > > I also fail to see how host1x module being in the DRM directory hinders > any downstream development. So I'm in favour of keeping host1x besides > the other DRM components to lower the burden for API changes and move it > out into some more generic directory, once we feel confident that the > API is reasonable stable. host1x module being in DRM directory hinders using nvhost from anywhere outside DRM in both upstream and downstream. I also don't like first putting the driver in one place, and then moving it with a huge commit to another place. We'd just postpone exactly the problems that were indicated earlier: we'd need to synchronize two trees to remove code in one and add in another at the same time so that there wouldn't be conflicting host1x drivers. I'd rather just add it in final place once, and be done with it. But if it's a make-it-or-brake-it for upstreaming, I can move it to be a subdirectory under drivers/gpu/drm/tegra. Would this mean that we'd modify the MAINTAINER's file so that the tegradrm entry excludes host1x sub-directory, and I'd add another one which included only the host1x sub-directory? The host1x part would be Supported, whereas rest of tegradrm is Maintained. Best regards, Terje
On Sat, Dec 1, 2012 at 12:31 PM, Terje Bergström <tbergstrom@nvidia.com> wrote: > We must've talked about a bit different things. For pure register defs, > I can accommodate changing to #defines. We'd lose the code coverage > analysis, though, but if the parentheses are a make-or-break question to > upstreaming, I can change. Out of sheer curiosity: What are you using the coverage data of these register definitions for? When I looked into coverage analysis the resulting data seemed rather useless to me, since the important thing is how well we cover the entire dynamic state space of the hw+sw (e.g. crap left behind by the bios ...) and coverage seemed to be a poor proxy for that. Hence why I wonder what you're doing with this data ... -Daniel
On Sat, Dec 01, 2012 at 01:31:02PM +0200, Terje Bergström wrote: > On 30.11.2012 12:38, Thierry Reding wrote: > > * PGP Signed by an unknown key > > The above implies that when you submit code it shouldn't contain pieces > > that prepare for possible future extensions which may or may not be > > submitted (the exception being if such changes are part of a series > > where subsequent patches actually use them). The outcome is that the > > amount of cruft in the mainline kernel is kept to a minimum. And that's > > a very good thing. > > We're now talking about actually a separation of logical and physical > driver. I can't see why that's a bad thing. Especially considering that > it's standard practice in well written drivers. Let's try to find a > technical clean solution instead of debating politics. The latter should > never be part of Linux kernel reviews. I don't know where you see politics in what I said. All I'm saying is that we shouldn't be making things needlessly complex. In my experience the technically cleanest solution is usually the one with the least complexity. > > Okay, so what you're sayingCan here is that a huge number of people haven't > > been wise in using the preprocessor for register definitions all these > > years. That's a pretty bold statement. Now I obviously haven't looked at > > every single line in the kernel, but I have never come across this usage > > for static inline functions used for this. So, to be honest, I don't > > think this is really up for discussion. Of course if you come up with an > > example where this is done in a similar way I could be persuaded > > otherwise. > > We must've talked about a bit different things. For pure register defs, > I can accommodate changing to #defines. We'd lose the code coverage > analysis, though, but if the parentheses are a make-or-break question to > upstreaming, I can change. > > I was thinking of definitions like this: > > static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v) > { > return (v & 0x1ff) << 0; > } > > versus > > #define host1x_sync_cfpeek_ctrl_cfpeek_addr_f(v) ((v) >> 16) & 0x3ff > > Both of these produce the same machine code and have same usage, but the > latter has type checking and code coverage analysis and the former is > (in my eyes) clearer. In both of these cases the usage is like this: > > writel(host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1) > | host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid) > | host1x_sync_cfpeek_ctrl_cfpeek_addr_f(rd_ptr), > m->sync_aperture + host1x_sync_cfpeek_ctrl_r()); Again there's no precedent for doing this with static inline functions. You can do the same with macros. Type checking isn't an issue in these cases since we're talking about bitfields for which no proper type exists. Two other things about the examples above: the definitions should be all caps and it would be nice if they could be made a bit shorter. > > But I don't see how that's relevant here. Let me quote what you said > > originally: > > > >> This is actually the only interface to read the max value to user space, > >> which can be useful for doing some comparisons that take wrapping into > >> account. But we could just add IOCTLs and remove the sysfs entries. > > > > To me that sounded like it was only used for debugging purposes. If you > > actually need to access this from a userspace driver then, as opposed to > > what I said earlier, this should be handled by some IOCTL. > > There's a use for production code to know both the max and min, but I > think we can just scope that use out from this patch sest. > > User space can use these two for checking if one of their fences has > already passed by comparing if the current value is between min and > fence, taking wrapping into account. In these cases user space can f.ex. > leave a host1x wait out from a command stream. But you already have extra code in the kernel to patch out expired sync- points. Is it really worth the added effort to burden userspace with this? If so I still think some kind of generic IOCTL to retrieve information about a syncpoint would be better than a sysfs interface. Thierry
On Sat, Dec 01, 2012 at 01:44:41PM +0200, Terje Bergström wrote: > On 30.11.2012 10:50, Lucas Stach wrote: > > I'm with Thierry here. I think there is a fair chance that we won't get > > the API right from the start, even when trying to come up with something > > that sounds sane to everyone. It's also not desirable to delay gr2d > > going into mainline until we are all completely satisfied with the API. > > > > I also fail to see how host1x module being in the DRM directory hinders > > any downstream development. So I'm in favour of keeping host1x besides > > the other DRM components to lower the burden for API changes and move it > > out into some more generic directory, once we feel confident that the > > API is reasonable stable. > > host1x module being in DRM directory hinders using nvhost from anywhere > outside DRM in both upstream and downstream. That's not true. Nothing keeps the rest of the kernel from using an API exported by the tegra-drm driver. > I also don't like first putting the driver in one place, and then > moving it with a huge commit to another place. Hehe, you're doing exactly that in this patch series. =) > We'd just postpone exactly the problems that were indicated earlier: > we'd need to synchronize two trees to remove code in one and add in > another at the same time so that there wouldn't be conflicting host1x > drivers. I'd rather just add it in final place once, and be done with > it. Yes, there would be a certain amount of synchronization needed, but as Stephen correctly pointed out we could do that move through one tree with the Acked-by of the other maintainer. The point is that we need to do this once instead of everytime the API changes. > But if it's a make-it-or-brake-it for upstreaming, I can move it to be a > subdirectory under drivers/gpu/drm/tegra. Would this mean that we'd > modify the MAINTAINER's file so that the tegradrm entry excludes host1x > sub-directory, and I'd add another one which included only the host1x > sub-directory? The host1x part would be Supported, whereas rest of > tegradrm is Maintained. An entry for drivers/gpu/drm/tegra/host1x would override an entry for drivers/gpu/drm/tegra so no need to exclude it. That said, there's no way to exclude an subdirectory in MAINTAINERS that I know of. My main point for keeping host1x within tegra-drm for now was that it could possibly help speed up the inclusion of the host1x code. Seeing that there's still a substantial amount of work to be done and a need for discussion I'm not sure if rushing this is the best way. In that case there may be justification for putting it in a separate location from the start. Thierry
On 01.12.2012 15:42, Daniel Vetter wrote: > Out of sheer curiosity: What are you using the coverage data of these > register definitions for? When I looked into coverage analysis the > resulting data seemed rather useless to me, since the important thing > is how well we cover the entire dynamic state space of the hw+sw (e.g. > crap left behind by the bios ...) and coverage seemed to be a poor > proxy for that. Hence why I wonder what you're doing with this data Yes, it's a poor proxy. But still, I use it to determine how big portions of hardware address space and fields I'm touching when running host1x tests. It's interesting data for planning tests, but not much more. Best regards, Terje
On 01.12.2012 17:10, Thierry Reding wrote: > On Sat, Dec 01, 2012 at 01:44:41PM +0200, Terje Bergström wrote: >> host1x module being in DRM directory hinders using nvhost from anywhere >> outside DRM in both upstream and downstream. > > That's not true. Nothing keeps the rest of the kernel from using an API > exported by the tegra-drm driver. Right, it's just a directory. I was actually thinking that it'd be weird if a V4L2 driver would use something from inside drivers/gpu/drm/tegra (V4L use DRM? Oh nooooo!). Shoot the idea down if it's crazy, but please think about it first. :-) I started thinking about this and we are constrained by the Linux kernel subsystems that have a complete different architecture than hardware. This leads to awkward design as DRM design as it conflicts with the way hardware works. Placing host1x driver in one place, DRM driver in another and XYZ driver in yet another is not ideal either. We're exposing a public API which needs to be strictly maintained, because we maintain drivers in different trees, but then again, the list of users is very static and well-defined, so public API is an overshoot. How about if we look at this from the hardware architecture point of view? You mentioned that perhaps drivers/bus/host1x would be the best place for host1x driver. What if we put also all host1x client modules under that same directory? drivers/bus/host1x/drm would be for DRM interface, and all other host1x client module drivers could be placed similarly. This way we could keep the host1x API private to host1x and the client module drivers, and it's easy to understand how host1x is used by just following the directory structure. Naturally, we could also think if we want to have sub-components per host1x client (dc, 2d, etc) and a drm sub-component that implements the DRM interface, and a V4L2 sub-component that implements V4L2 interface (when/if I can convince people that camera should go upstream). >> I also don't like first putting the driver in one place, and then >> moving it with a huge commit to another place. > > Hehe, you're doing exactly that in this patch series. =) True, I guess it's just a matter of determining what's the best time. > Yes, there would be a certain amount of synchronization needed, but as > Stephen correctly pointed out we could do that move through one tree > with the Acked-by of the other maintainer. The point is that we need to > do this once instead of everytime the API changes. Yep, inter-tree synchronization is possible, so not a show stopper. > An entry for drivers/gpu/drm/tegra/host1x would override an entry for > drivers/gpu/drm/tegra so no need to exclude it. That said, there's no > way to exclude an subdirectory in MAINTAINERS that I know of. I saw tag X: in MAINTAINERS file, so that could be used. There's documentation for it, but also some examples like: IBM Power Virtual SCSI/FC Device Drivers M: Robert Jennings <rcj@linux.vnet.ibm.com> L: linux-scsi@vger.kernel.org S: Supported F: drivers/scsi/ibmvscsi/ X: drivers/scsi/ibmvscsi/ibmvstgt.c > My main point for keeping host1x within tegra-drm for now was that it > could possibly help speed up the inclusion of the host1x code. Seeing > that there's still a substantial amount of work to be done and a need > for discussion I'm not sure if rushing this is the best way. In that > case there may be justification for putting it in a separate location > from the start. I'm not in a hurry, so let's try to figure the best design first. Biggest architectural unsolved problem is the memory management and relationship between tegradrm and host1x driver. What Lucas proposed about memory management makes sense, but it'll take a while to implement it. The rest of the unsolved questions are more about differences in opinion, and solvable. Terje
On 01.12.2012 16:58, Thierry Reding wrote: > I don't know where you see politics in what I said. All I'm saying is > that we shouldn't be making things needlessly complex. In my experience > the technically cleanest solution is usually the one with the least > complexity. Let me come up with a proposal and let's then see where to go next. > But you already have extra code in the kernel to patch out expired sync- > points. Is it really worth the added effort to burden userspace with > this? If so I still think some kind of generic IOCTL to retrieve > information about a syncpoint would be better than a sysfs interface. That's exactly why I mentioned that it's not useful to upstream. There are some cases where user space might want to check if a fence has passed without waiting for it, but that's marginal and could be handled even with waits with zero timeout. Terje
Am Samstag, den 01.12.2012, 18:55 +0200 schrieb Terje Bergström: > On 01.12.2012 17:10, Thierry Reding wrote: > > On Sat, Dec 01, 2012 at 01:44:41PM +0200, Terje Bergström wrote: > >> host1x module being in DRM directory hinders using nvhost from anywhere > >> outside DRM in both upstream and downstream. > > > > That's not true. Nothing keeps the rest of the kernel from using an API > > exported by the tegra-drm driver. > > Right, it's just a directory. I was actually thinking that it'd be weird > if a V4L2 driver would use something from inside drivers/gpu/drm/tegra > (V4L use DRM? Oh nooooo!). > Yes it _is_ weird to have V4L using something which resides inside DRM, but see below. > Shoot the idea down if it's crazy, but please think about it first. :-) > > I started thinking about this and we are constrained by the Linux kernel > subsystems that have a complete different architecture than hardware. > This leads to awkward design as DRM design as it conflicts with the way > hardware works. > > Placing host1x driver in one place, DRM driver in another and XYZ driver > in yet another is not ideal either. We're exposing a public API which > needs to be strictly maintained, because we maintain drivers in > different trees, but then again, the list of users is very static and > well-defined, so public API is an overshoot. > How about if we look at this from the hardware architecture point of > view? You mentioned that perhaps drivers/bus/host1x would be the best > place for host1x driver. > > What if we put also all host1x client modules under that same directory? > drivers/bus/host1x/drm would be for DRM interface, and all other host1x > client module drivers could be placed similarly. This way we could keep > the host1x API private to host1x and the client module drivers, and it's > easy to understand how host1x is used by just following the directory > structure. > This would certainly make life easier, but personally I don't think it's the right thing to do. The separation of the Linux kernel into different subsystems was done for a reason and just because the specific hardware at hands happens to work a bit different is no valid reason to break with the standard rules of the kernel. So I think there is no way around handling the different drivers that use host1x in different trees. For the time being there is _only_ tegra-drm using host1x in the upstream kernel. We have to make sure to come up with some API which is reasonably stable, so we don't run into big problems later. That's why I'm really in favour to keep host1x and tegra-drm side by side in the current upstream, to make sure we can change the API without jumping through too much hoops. Your downstream V4L would have to use host1x from the DRM directory, but really: is your downstream such a nice, clean codebase that you are not able to cope with the slight ugliness of this solution? > Naturally, we could also think if we want to have sub-components per > host1x client (dc, 2d, etc) and a drm sub-component that implements the > DRM interface, and a V4L2 sub-component that implements V4L2 interface > (when/if I can convince people that camera should go upstream). > To me this sound as if V4L upstream support is still a fair time away. IMHO the right time to move out host1x is exactly the point when a second user starts appearing upstream. This will give us some time to fiddle with the API until we have to commit to it as being stable. > >> I also don't like first putting the driver in one place, and then > >> moving it with a huge commit to another place. > > > > Hehe, you're doing exactly that in this patch series. =) > > True, I guess it's just a matter of determining what's the best time. > See above. [...] > I'm not in a hurry, so let's try to figure the best design first. > Biggest architectural unsolved problem is the memory management and > relationship between tegradrm and host1x driver. What Lucas proposed > about memory management makes sense, but it'll take a while to implement it. Please make sure to remove any unnecessary cruft from host1x in the process and don't try to make too big of a step at once. We only need one type of memory within host1x: native host1x objects, no need to plan for support of anything else. Also taking over ownership of the IOMMU address space might take some more work in the IOMMU API. We can leave this out completely for a start. Both Tegra 2 and 3 should be able to work with CMA backed objects just fine. Regards, Lucas
On 01.12.2012 19:34, Lucas Stach wrote: > This would certainly make life easier, but personally I don't think it's > the right thing to do. The separation of the Linux kernel into different > subsystems was done for a reason and just because the specific hardware > at hands happens to work a bit different is no valid reason to break > with the standard rules of the kernel. > > So I think there is no way around handling the different drivers that > use host1x in different trees. For the time being there is _only_ > tegra-drm using host1x in the upstream kernel. We have to make sure to > come up with some API which is reasonably stable, so we don't run into > big problems later. That's why I'm really in favour to keep host1x and > tegra-drm side by side in the current upstream, to make sure we can > change the API without jumping through too much hoops. > > Your downstream V4L would have to use host1x from the DRM directory, but > really: is your downstream such a nice, clean codebase that you are not > able to cope with the slight ugliness of this solution? Ok, can do. I'll move the code base to drivers/gpu/drm/tegra/host1x. For downstream, the host1x driver implements all user space APIs (no drm, no v4l, etc) so the directory is of no consequence. If we immersed host1x driver totally with tegra-drm, that'd be a problem, but if I can keep a separation, that's fine. > Please make sure to remove any unnecessary cruft from host1x in the > process and don't try to make too big of a step at once. We only need > one type of memory within host1x: native host1x objects, no need to plan > for support of anything else. Also taking over ownership of the IOMMU > address space might take some more work in the IOMMU API. We can leave > this out completely for a start. Both Tegra 2 and 3 should be able to > work with CMA backed objects just fine. Ok, that simplifies the process. I'll just implement firewall and copy the strema over to kernel space unconditionally. Terje
Guys I think you guys might be overthniking things here. I know you have some sort of upstream/downstream split, but really in the upstream kernel, we don't care about that, so don't make it our problem. There is no need for any sort of stable API between host1x and the sub drivers, we change APIs in the kernel the whole time it isn't a problem. If you need to change the API, submit a single patch changing it across all the drivers in the tree, collecting Acks or not as needed. We do this the whole time, I've never had or seen a problem with it. We don't do separate subsystems APIs set in stone bullshit, and all subsystem maintainers are used to dealing with these sort of issues. You get an ack from one maintainer and the other one sticks it in his tree with a note to Linus. You can put the code where you want, maybe just under drivers/gpu instead of drivers/video or drivers/gpu/drm, just make sure you have a path for it into the kernel. And I have an non-upstream precedent for v4l sitting on drm, some radeon GPUs have capture tuners, and the only way to implement that would be to stick a v4l driver in the radeon drm driver. Not a problem, just never finished writing the code. Dave.
On Sun, Dec 02, 2012 at 07:42:06AM +1000, Dave Airlie wrote: > Guys I think you guys might be overthniking things here. > > I know you have some sort of upstream/downstream split, but really in > the upstream kernel, we don't care about that, so don't make it our > problem. > > There is no need for any sort of stable API between host1x and the sub > drivers, we change APIs in the kernel the whole time it isn't a > problem. Point taken. I was primarily concerned about needless churn during early development. But given the latest discussions it has become clear that there's no need to rush things and therefore we should be able to resolve any potential issues that could result in churn before the first patches are merged. > You can put the code where you want, maybe just under drivers/gpu > instead of drivers/video or drivers/gpu/drm, just make sure you have a > path for it into the kernel. drivers/gpu/host1x sounds like a good location to me. Does that still go in via your tree? Thierry
On 01.12.2012 23:42, Dave Airlie wrote: > Guys I think you guys might be overthniking things here. > > I know you have some sort of upstream/downstream split, but really in > the upstream kernel, we don't care about that, so don't make it our > problem. I am not trying to make anything your problem. Most of the issues we have already worked out with a good solution that all active participants have agreed with. We have only a couple of disagreements with Thierry. My goal is to get a good open source co-operation and trying to prevent a code fork while still maintaining good design. That way everybody wins. The way to do that is to base our BSP on upstream kernel. I'm not trying to here throw code over the fence and flee. This is a genuine attempt to work together. I want to prevent the "we" (kernel community excluding NVIDIA) and "you" (NVIDIA) that a split code base would cause in the long run. I'd like to just talk about "we" including NVIDIA. > There is no need for any sort of stable API between host1x and the sub > drivers, we change APIs in the kernel the whole time it isn't a > problem. > > If you need to change the API, submit a single patch changing it > across all the drivers in the tree, collecting Acks or not as needed. > We do this the whole time, I've never had or seen a problem with it. > > We don't do separate subsystems APIs set in stone bullshit, and all > subsystem maintainers are used to dealing with these sort of issues. > You get an ack from one maintainer and the other one sticks it in his > tree with a note to Linus. > > You can put the code where you want, maybe just under drivers/gpu > instead of drivers/video or drivers/gpu/drm, just make sure you have a > path for it into the kernel. Follows exactly my thinking, as the location of host1x driver has no practical consequence to me. Thierry proposed drivers/gpu/host1x. I'd like to see a couple of comments on that proposal, and if it sticks, follow that. Thierry, did you mean that host1x driver would be in drivers/gpu/host1x, and tegradrm in drivers/gpu/drm/tegra, or would we put both in same directory? > And I have an non-upstream precedent for v4l sitting on drm, some > radeon GPUs have capture tuners, and the only way to implement that > would be to stick a v4l driver in the radeon drm driver. Not a > problem, just never finished writing the code. Yes, I just mentioned that as awkward, but I have no problem with any path. Terje
On Sun, Dec 02, 2012 at 01:24:13PM +0200, Terje Bergström wrote: > On 01.12.2012 23:42, Dave Airlie wrote: > > Guys I think you guys might be overthniking things here. > > > > I know you have some sort of upstream/downstream split, but really in > > the upstream kernel, we don't care about that, so don't make it our > > problem. > > I am not trying to make anything your problem. Most of the issues we > have already worked out with a good solution that all active > participants have agreed with. We have only a couple of disagreements > with Thierry. > > My goal is to get a good open source co-operation and trying to prevent > a code fork while still maintaining good design. That way everybody > wins. The way to do that is to base our BSP on upstream kernel. Yes, that's exactly what you should be doing. > I'm not trying to here throw code over the fence and flee. This is a > genuine attempt to work together. I want to prevent the "we" (kernel > community excluding NVIDIA) and "you" (NVIDIA) that a split code base > would cause in the long run. I'd like to just talk about "we" including > NVIDIA. FWIW I'm convinced that you're genuinely trying to make this work and nobody welcomes this more than me. However it is only natural if you dump such a large body of code on the community that people will disagree with some of the design decisions. So when I comment on the design or patches in general, it is not my intention to exclude you or NVIDIA in any way. All I'm trying to do is spot problematic or unclear parts that will make working with the code any more difficult than it has to be. > > There is no need for any sort of stable API between host1x and the sub > > drivers, we change APIs in the kernel the whole time it isn't a > > problem. > > > > If you need to change the API, submit a single patch changing it > > across all the drivers in the tree, collecting Acks or not as needed. > > We do this the whole time, I've never had or seen a problem with it. > > > > We don't do separate subsystems APIs set in stone bullshit, and all > > subsystem maintainers are used to dealing with these sort of issues. > > You get an ack from one maintainer and the other one sticks it in his > > tree with a note to Linus. > > > > You can put the code where you want, maybe just under drivers/gpu > > instead of drivers/video or drivers/gpu/drm, just make sure you have a > > path for it into the kernel. > > Follows exactly my thinking, as the location of host1x driver has no > practical consequence to me. > > Thierry proposed drivers/gpu/host1x. I'd like to see a couple of > comments on that proposal, and if it sticks, follow that. > > Thierry, did you mean that host1x driver would be in drivers/gpu/host1x, > and tegradrm in drivers/gpu/drm/tegra, or would we put both in same > directory? Since tegra-drm is a DRM driver it should stay in drivers/gpu/drm. I can also live with the host1x driver staying in drivers/video, but I don't think it's the proper location and drivers/gpu/host1x seems like a much better fit. Thierry
On 02.12.2012 22:55, Thierry Reding wrote: > FWIW I'm convinced that you're genuinely trying to make this work and > nobody welcomes this more than me. However it is only natural if you > dump such a large body of code on the community that people will > disagree with some of the design decisions. > > So when I comment on the design or patches in general, it is not my > intention to exclude you or NVIDIA in any way. All I'm trying to do is > spot problematic or unclear parts that will make working with the code > any more difficult than it has to be. Thanks, I know it'a a large dump and you've made great comments about the code, and hit most of the sore spots I've had with the driver, too. I appreciate your effort - this process is making the driver better. It's good to hear that our goals are aligned. So now that we got that out of the system, let's get back to business. :-) > Since tegra-drm is a DRM driver it should stay in drivers/gpu/drm. I can > also live with the host1x driver staying in drivers/video, but I don't > think it's the proper location and drivers/gpu/host1x seems like a much > better fit. That sounds like a plan to me. Terje
On 11/30/2012 03:38 AM, Thierry Reding wrote: > On Fri, Nov 30, 2012 at 10:56:39AM +0200, Terje Bergström wrote: >> On 29.11.2012 13:47, Thierry Reding wrote: >>> On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström >>> wrote: >>>> Tegra20 and Tegra30 are compatible, but future chips are not. >>>> I was hoping we would be ready in upstream kernel for future >>>> chips. >>> >>> I think we should ignore that problem for now. Generally >>> planning for any possible combination of incompatibilities >>> leads to overgeneralized designs that require precisely these >>> kinds of indirections. >>> >>> Once some documentation for Tegra 40 materializes we can start >>> thinking about how to encapsulate the incompatible code. >> >> I think here our perspectives differ a lot. That is natural >> considering the company I work for and company you work for, so >> let's try to sync the perspective. >> >> In my reality, whatever is in market is old news and I barely >> work on them anymore. Upstreaming activity is the exception. 90% >> of my time is spent dealing with future chips which I know cannot >> be handled without this split to logical and physical driver >> parts. >> >> For you, Tegra2 and Tegra3 are the reality. > > To be fair, Tegra2 and Tegra3 are the reality for *everyone* > *outside* NVIDIA. > > It's great that you spend most of your time working with future > chips, but unless you submit the code for inclusion or review > nobody upstream needs to be concerned about the implications. Most > people don't have time to waste so we naturally try to keep the > maintenance burden to a minimum. > > The above implies that when you submit code it shouldn't contain > pieces that prepare for possible future extensions which may or may > not be submitted (the exception being if such changes are part of a > series where subsequent patches actually use them). The outcome is > that the amount of cruft in the mainline kernel is kept to a > minimum. And that's a very good thing. I think there's room for letting Terje's complete knowledge of future chips guide the design of the current code that's sent upstream. Certainly we shouldn't add a ton of unnecessary abstraction layers right now that aren't needed for Tegra20/30, but if there's some decision that doesn't affect the bloat, opaqueness, ... of the current code but one choice is better for future development without serious negatives for the current code, it's pretty reasonable to make that decision rather than the other. (That all said, I haven't really followed the details of this particular point, so I can't say how my comment applies to any decisions being made right now - just that we shouldn't blanket reject future knowledge when making decisions) After all, making the right decision now will reduce the number/size of patches later, and hence reduce code churn and reviewer load.
On 12/01/2012 07:58 AM, Thierry Reding wrote: > On Sat, Dec 01, 2012 at 01:31:02PM +0200, Terje Bergström wrote: ... >> I was thinking of definitions like this: >> >> static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v) { >> return (v & 0x1ff) << 0; } >> >> versus >> >> #define host1x_sync_cfpeek_ctrl_cfpeek_addr_f(v) ((v) >> 16) & >> 0x3ff >> >> Both of these produce the same machine code and have same usage, >> but the latter has type checking and code coverage analysis and >> the former is (in my eyes) clearer. In both of these cases the >> usage is like this: >> >> writel(host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1) | >> host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid) | >> host1x_sync_cfpeek_ctrl_cfpeek_addr_f(rd_ptr), m->sync_aperture + >> host1x_sync_cfpeek_ctrl_r()); > > Again there's no precedent for doing this with static inline > functions. You can do the same with macros. Type checking isn't an > issue in these cases since we're talking about bitfields for which > no proper type exists. I suspect the inline functions could encode signed-vs-unsigned fields, perhaps catch u8 variables when they should have been u32, etc.?
On Mon, Dec 03, 2012 at 12:20:30PM -0700, Stephen Warren wrote: > On 11/30/2012 03:38 AM, Thierry Reding wrote: > > On Fri, Nov 30, 2012 at 10:56:39AM +0200, Terje Bergström wrote: > >> On 29.11.2012 13:47, Thierry Reding wrote: > >>> On Thu, Nov 29, 2012 at 12:21:04PM +0200, Terje Bergström > >>> wrote: > >>>> Tegra20 and Tegra30 are compatible, but future chips are not. > >>>> I was hoping we would be ready in upstream kernel for future > >>>> chips. > >>> > >>> I think we should ignore that problem for now. Generally > >>> planning for any possible combination of incompatibilities > >>> leads to overgeneralized designs that require precisely these > >>> kinds of indirections. > >>> > >>> Once some documentation for Tegra 40 materializes we can start > >>> thinking about how to encapsulate the incompatible code. > >> > >> I think here our perspectives differ a lot. That is natural > >> considering the company I work for and company you work for, so > >> let's try to sync the perspective. > >> > >> In my reality, whatever is in market is old news and I barely > >> work on them anymore. Upstreaming activity is the exception. 90% > >> of my time is spent dealing with future chips which I know cannot > >> be handled without this split to logical and physical driver > >> parts. > >> > >> For you, Tegra2 and Tegra3 are the reality. > > > > To be fair, Tegra2 and Tegra3 are the reality for *everyone* > > *outside* NVIDIA. > > > > It's great that you spend most of your time working with future > > chips, but unless you submit the code for inclusion or review > > nobody upstream needs to be concerned about the implications. Most > > people don't have time to waste so we naturally try to keep the > > maintenance burden to a minimum. > > > > The above implies that when you submit code it shouldn't contain > > pieces that prepare for possible future extensions which may or may > > not be submitted (the exception being if such changes are part of a > > series where subsequent patches actually use them). The outcome is > > that the amount of cruft in the mainline kernel is kept to a > > minimum. And that's a very good thing. > > I think there's room for letting Terje's complete knowledge of future > chips guide the design of the current code that's sent upstream. > Certainly we shouldn't add a ton of unnecessary abstraction layers > right now that aren't needed for Tegra20/30, but if there's some > decision that doesn't affect the bloat, opaqueness, ... of the current > code but one choice is better for future development without serious > negatives for the current code, it's pretty reasonable to make that > decision rather than the other. The original point was that the current design stashes every function of host1x into an ops structure and you have to go through those ops to get at the functionality. I can understand the need to add an ops structure to cope with incompatibilities between versions, but as you say there should to be a reason for them being introduced. If such reasons exists, then I think they at least warrant a comment somewhere. Furthermore this is usually best handled by wrapping the ops accesses in a public API, so that the ops structure can be hidden within the driver. For example, submitting a job to a channel should have a public API such as: int host1x_channel_submit(struct host1x_channel *channel, struct host1x_job *job) { ... } An initial implementation would just add the code into this function. If it turns out some future version requires special incantations to submit a job, only then should we introduce an ops structure, with only the one function: struct host1x_channel_ops { int (*submit)(struct host1x_channel *channel, struct host1x_job *job); }; But since only the public API above has been used, access to the special implementation can be hidden from the user. So the public function could be modified in this way: int host1x_channel_submit(struct hostx1_channel *channel, struct host1x_job *job) { if (channel->ops && channel->ops->submit) return channel->ops->submit(channel, job); ... } And then you have two choices: either you keep the code for previous generations after the if block or you provide a separate ops structure for older generations as well and handle them via the same code path. One other thing that such a design can help with is refactoring common code or parameterizing code. Maybe newer generations are not compatible but can easily be made to work with existing code by introducing a variable such as register stride or something. What's really difficult to follow is if an ops structure is accessed via some global macro. It also breaks encapsulation because you have a global ops structure. That may even work fine for now, but it will break once you have more than a single host1x in a system. I know this will never happen, but all of a sudden it happens anyway and the code breaks. Doing this right isn't very hard and it will lead to a better design and is less likely to break at some point. Thierry
On 12/04/2012 05:03 AM, Thierry Reding wrote: [...] >> I think there's room for letting Terje's complete knowledge of future >> chips guide the design of the current code that's sent upstream. >> Certainly we shouldn't add a ton of unnecessary abstraction layers >> right now that aren't needed for Tegra20/30, but if there's some >> decision that doesn't affect the bloat, opaqueness, ... of the current >> code but one choice is better for future development without serious >> negatives for the current code, it's pretty reasonable to make that >> decision rather than the other. > > The original point was that the current design stashes every function of > host1x into an ops structure and you have to go through those ops to get > at the functionality. I can understand the need to add an ops structure > to cope with incompatibilities between versions, but as you say there > should to be a reason for them being introduced. If such reasons exists, > then I think they at least warrant a comment somewhere. > > Furthermore this is usually best handled by wrapping the ops accesses in > a public API, so that the ops structure can be hidden within the driver. > For example, submitting a job to a channel should have a public API such > as: > > int host1x_channel_submit(struct host1x_channel *channel, > struct host1x_job *job) > { > ... > } > > An initial implementation would just add the code into this function. If > it turns out some future version requires special incantations to submit > a job, only then should we introduce an ops structure, with only the one > function: > > struct host1x_channel_ops { > int (*submit)(struct host1x_channel *channel, > struct host1x_job *job); > }; > > But since only the public API above has been used, access to the special > implementation can be hidden from the user. So the public function could > be modified in this way: > > int host1x_channel_submit(struct hostx1_channel *channel, > struct host1x_job *job) > { > if (channel->ops && channel->ops->submit) > return channel->ops->submit(channel, job); > > ... > } > I guess we do this in exactly this way at the beginning. Then we realized that we need to define callbacks to make different tegra has different logics. So that's why we see the codes have lots of function ops right now. If so, this will make Terje modify the code back to the original version, and this is not an interesting work. Just my personal guess, no offence. Mark > And then you have two choices: either you keep the code for previous > generations after the if block or you provide a separate ops structure > for older generations as well and handle them via the same code path. > > One other thing that such a design can help with is refactoring common > code or parameterizing code. Maybe newer generations are not compatible > but can easily be made to work with existing code by introducing a > variable such as register stride or something. > > What's really difficult to follow is if an ops structure is accessed via > some global macro. It also breaks encapsulation because you have a > global ops structure. That may even work fine for now, but it will break > once you have more than a single host1x in a system. I know this will > never happen, but all of a sudden it happens anyway and the code breaks. > Doing this right isn't very hard and it will lead to a better design and > is less likely to break at some point. > > Thierry > > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel >
On 12/04/2012 05:03 AM, Thierry Reding wrote: [...] > > One other thing that such a design can help with is refactoring common > code or parameterizing code. Maybe newer generations are not compatible > but can easily be made to work with existing code by introducing a > variable such as register stride or something. > > What's really difficult to follow is if an ops structure is accessed via > some global macro. It also breaks encapsulation because you have a > global ops structure. That may even work fine for now, but it will break > once you have more than a single host1x in a system. I know this will > never happen, but all of a sudden it happens anyway and the code breaks. > Doing this right isn't very hard and it will lead to a better design and > is less likely to break at some point. > Sorry I forget to reply this in last mail... Agree. Even for userspace programs, we should avoid using global variables as possible as we can. So we need to think about this and try to reduce the number of global vars. Mark > Thierry > > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel >
On 03.12.2012 23:03, Thierry Reding wrote: > What's really difficult to follow is if an ops structure is accessed via > some global macro. It also breaks encapsulation because you have a > global ops structure. That may even work fine for now, but it will break > once you have more than a single host1x in a system. I know this will > never happen, but all of a sudden it happens anyway and the code breaks. > Doing this right isn't very hard and it will lead to a better design and > is less likely to break at some point. I agree that the chip ops access goes through too much indirection and macro magic (which I already declared I hate), so we're going to design something simpler. Terje
On Mon, Dec 03, 2012 at 12:23:32PM -0700, Stephen Warren wrote: > On 12/01/2012 07:58 AM, Thierry Reding wrote: > > On Sat, Dec 01, 2012 at 01:31:02PM +0200, Terje Bergström wrote: > ... > >> I was thinking of definitions like this: > >> > >> static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v) { > >> return (v & 0x1ff) << 0; } > >> > >> versus > >> > >> #define host1x_sync_cfpeek_ctrl_cfpeek_addr_f(v) ((v) >> 16) & > >> 0x3ff > >> > >> Both of these produce the same machine code and have same usage, > >> but the latter has type checking and code coverage analysis and > >> the former is (in my eyes) clearer. In both of these cases the > >> usage is like this: > >> > >> writel(host1x_sync_cfpeek_ctrl_cfpeek_ena_f(1) | > >> host1x_sync_cfpeek_ctrl_cfpeek_channr_f(chid) | > >> host1x_sync_cfpeek_ctrl_cfpeek_addr_f(rd_ptr), m->sync_aperture + > >> host1x_sync_cfpeek_ctrl_r()); > > > > Again there's no precedent for doing this with static inline > > functions. You can do the same with macros. Type checking isn't an > > issue in these cases since we're talking about bitfields for which > > no proper type exists. > > I suspect the inline functions could encode signed-vs-unsigned fields, > perhaps catch u8 variables when they should have been u32, etc.? I don't see how this would be relevant here. These definitions are only used in the driver internally and not a public API, therefore none of those checks should really be needed. If somebody writes code for this driver and uses the register definitions, they better know what they're doing. Or at least wrong usage should be filtered out through review. In my opinion the consistency with how other drivers are written far outweigh the benefits provided by inline functions. That said, I'm out of arguments and I don't have a final say anyway, so if it is decided to stick with inline functions I can find a way to live with them. Thierry
On 29.11.2012 11:10, Mark Zhang wrote: >> + >> +/** >> + * Write a cpu syncpoint increment to the hardware, without touching >> + * the cache. Caller is responsible for host being powered. >> + */ >> +static void host1x_syncpt_cpu_incr(struct nvhost_syncpt *sp, u32 id) >> +{ >> + struct nvhost_master *dev = syncpt_to_dev(sp); >> + u32 reg_offset = id / 32; >> + >> + if (!nvhost_module_powered(dev->dev)) { >> + dev_err(&syncpt_to_dev(sp)->dev->dev, >> + "Trying to access host1x when it's off"); >> + return; >> + } >> + >> + if (!nvhost_syncpt_client_managed(sp, id) >> + && nvhost_syncpt_min_eq_max(sp, id)) { >> + dev_err(&syncpt_to_dev(sp)->dev->dev, >> + "Trying to increment syncpoint id %d beyond max\n", >> + id); >> + return; >> + } >> + writel(BIT_MASK(id), dev->sync_aperture + >> + host1x_sync_syncpt_cpu_incr_r() + reg_offset * 4); > > I have a stupid question: According to the name and the context of this > function, seems it increases the syncpt value which specified by param > "id". So how does this "writel" increase the value? I don't know much > about host1x/syncpt reg operations, so could you explain a little bit or > I just completely have a wrong understanding? I believe I've implemented most of the requests in this mail, but I seem to have missed answering this question. writel() to that register invokes a method in host1x, which increments the sync point indicated by the value of the register by one. Best regards, Terje
diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig index fb9a14e..94c861b 100644 --- a/drivers/video/Kconfig +++ b/drivers/video/Kconfig @@ -2463,4 +2463,6 @@ config FB_SH_MOBILE_MERAM Up to 4 memory channels can be configured, allowing 4 RGB or 2 YCbCr framebuffers to be configured. +source "drivers/video/tegra/host/Kconfig" + endmenu diff --git a/drivers/video/Makefile b/drivers/video/Makefile index b936b00..61a4287 100644 --- a/drivers/video/Makefile +++ b/drivers/video/Makefile @@ -17,6 +17,8 @@ obj-y += backlight/ obj-$(CONFIG_EXYNOS_VIDEO) += exynos/ +obj-$(CONFIG_TEGRA_HOST1X) += tegra/host/ + obj-$(CONFIG_FB_CFB_FILLRECT) += cfbfillrect.o obj-$(CONFIG_FB_CFB_COPYAREA) += cfbcopyarea.o obj-$(CONFIG_FB_CFB_IMAGEBLIT) += cfbimgblt.o diff --git a/drivers/video/tegra/host/Kconfig b/drivers/video/tegra/host/Kconfig new file mode 100644 index 0000000..ebe9bbc --- /dev/null +++ b/drivers/video/tegra/host/Kconfig @@ -0,0 +1,5 @@ +config TEGRA_HOST1X + tristate "Tegra host1x driver" + help + Driver for the Tegra host1x hardware. + diff --git a/drivers/video/tegra/host/Makefile b/drivers/video/tegra/host/Makefile new file mode 100644 index 0000000..3edab4a --- /dev/null +++ b/drivers/video/tegra/host/Makefile @@ -0,0 +1,10 @@ +ccflags-y = -Idrivers/video/tegra/host + +nvhost-objs = \ + nvhost_acm.o \ + nvhost_syncpt.o \ + dev.o \ + chip_support.o + +obj-$(CONFIG_TEGRA_HOST1X) += host1x/ +obj-$(CONFIG_TEGRA_HOST1X) += nvhost.o diff --git a/drivers/video/tegra/host/chip_support.c b/drivers/video/tegra/host/chip_support.c new file mode 100644 index 0000000..5a44147 --- /dev/null +++ b/drivers/video/tegra/host/chip_support.c @@ -0,0 +1,48 @@ +/* + * drivers/video/tegra/host/chip_support.c + * + * Tegra host1x chip support module + * + * Copyright (c) 2012, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include <linux/errno.h> +#include <linux/types.h> +#include <linux/slab.h> + +#include "chip_support.h" +#include "host1x/host1x01.h" + +struct nvhost_chip_support *nvhost_chip_ops; + +struct nvhost_chip_support *nvhost_get_chip_ops(void) +{ + return nvhost_chip_ops; +} + +int nvhost_init_chip_support(struct nvhost_master *host) +{ + if (nvhost_chip_ops == NULL) { + nvhost_chip_ops = kzalloc(sizeof(*nvhost_chip_ops), GFP_KERNEL); + if (nvhost_chip_ops == NULL) { + pr_err("%s: Cannot allocate nvhost_chip_support\n", + __func__); + return -ENOMEM; + } + } + + nvhost_init_host1x01_support(host, nvhost_chip_ops); + return 0; +} diff --git a/drivers/video/tegra/host/chip_support.h b/drivers/video/tegra/host/chip_support.h new file mode 100644 index 0000000..acfa2f1 --- /dev/null +++ b/drivers/video/tegra/host/chip_support.h @@ -0,0 +1,52 @@ +/* + * drivers/video/tegra/host/chip_support.h + * + * Tegra host1x chip Support + * + * Copyright (c) 2011-2012, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ +#ifndef _NVHOST_CHIP_SUPPORT_H_ +#define _NVHOST_CHIP_SUPPORT_H_ + +#include <linux/types.h> + +struct output; + +struct nvhost_master; +struct nvhost_syncpt; +struct platform_device; + +struct nvhost_syncpt_ops { + void (*reset)(struct nvhost_syncpt *, u32 id); + void (*reset_wait_base)(struct nvhost_syncpt *, u32 id); + void (*read_wait_base)(struct nvhost_syncpt *, u32 id); + u32 (*update_min)(struct nvhost_syncpt *, u32 id); + void (*cpu_incr)(struct nvhost_syncpt *, u32 id); + void (*debug)(struct nvhost_syncpt *); + const char * (*name)(struct nvhost_syncpt *, u32 id); +}; + +struct nvhost_chip_support { + const char *soc_name; + struct nvhost_syncpt_ops syncpt; +}; + +struct nvhost_chip_support *nvhost_get_chip_ops(void); + +#define syncpt_op() (nvhost_get_chip_ops()->syncpt) + +int nvhost_init_chip_support(struct nvhost_master *host); + +#endif /* _NVHOST_CHIP_SUPPORT_H_ */ diff --git a/drivers/video/tegra/host/dev.c b/drivers/video/tegra/host/dev.c new file mode 100644 index 0000000..98c9c9f --- /dev/null +++ b/drivers/video/tegra/host/dev.c @@ -0,0 +1,96 @@ +/* + * drivers/video/tegra/host/dev.c + * + * Tegra host1x driver + * + * Copyright (c) 2010-2012, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include <linux/module.h> +#include "host1x/host1x.h" +#include "nvhost_acm.h" + +u32 host1x_syncpt_incr_max(u32 id, u32 incrs) +{ + struct nvhost_syncpt *sp = &nvhost->syncpt; + return nvhost_syncpt_incr_max(sp, id, incrs); +} +EXPORT_SYMBOL(host1x_syncpt_incr_max); + +void host1x_syncpt_incr(u32 id) +{ + struct nvhost_syncpt *sp = &nvhost->syncpt; + nvhost_syncpt_incr(sp, id); +} +EXPORT_SYMBOL(host1x_syncpt_incr); + +u32 host1x_syncpt_read(u32 id) +{ + struct nvhost_syncpt *sp = &nvhost->syncpt; + return nvhost_syncpt_read(sp, id); +} +EXPORT_SYMBOL(host1x_syncpt_read); + +bool host1x_powered(struct platform_device *dev) +{ + bool ret = 0; + + /* get the parent */ + if (dev->dev.parent) { + struct platform_device *pdev; + pdev = to_platform_device(dev->dev.parent); + + ret = nvhost_module_powered(pdev); + } else { + dev_warn(&dev->dev, "Cannot return power state, no parent\n"); + } + + return ret; +} +EXPORT_SYMBOL(host1x_powered); + +void host1x_busy(struct platform_device *dev) +{ + /* get the parent */ + if (dev->dev.parent) { + struct platform_device *pdev; + pdev = to_platform_device(dev->dev.parent); + + nvhost_module_busy(pdev); + } else { + dev_warn(&dev->dev, "Cannot turn on, no parent\n"); + } +} +EXPORT_SYMBOL(host1x_busy); + +void host1x_idle(struct platform_device *dev) +{ + /* get the parent */ + if (dev->dev.parent) { + struct platform_device *pdev; + pdev = to_platform_device(dev->dev.parent); + + nvhost_module_idle(pdev); + } else { + dev_warn(&dev->dev, "Cannot idle, no parent\n"); + } +} +EXPORT_SYMBOL(host1x_idle); + +MODULE_AUTHOR("Terje Bergstrom <tbergstrom@nvidia.com>"); +MODULE_DESCRIPTION("Host1x driver for Tegra products"); +MODULE_VERSION("1.0"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS("platform-nvhost"); diff --git a/drivers/video/tegra/host/host1x/Makefile b/drivers/video/tegra/host/host1x/Makefile new file mode 100644 index 0000000..330d507 --- /dev/null +++ b/drivers/video/tegra/host/host1x/Makefile @@ -0,0 +1,7 @@ +ccflags-y = -Idrivers/video/tegra/host + +nvhost-host1x-objs = \ + host1x.o \ + host1x01.o + +obj-$(CONFIG_TEGRA_HOST1X) += nvhost-host1x.o diff --git a/drivers/video/tegra/host/host1x/host1x.c b/drivers/video/tegra/host/host1x/host1x.c new file mode 100644 index 0000000..77ff00b --- /dev/null +++ b/drivers/video/tegra/host/host1x/host1x.c @@ -0,0 +1,204 @@ +/* + * drivers/video/tegra/host/host1x.c + * + * Tegra host1x Driver Entrypoint + * + * Copyright (c) 2010-2012, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include <linux/slab.h> +#include <linux/fs.h> +#include <linux/cdev.h> +#include <linux/uaccess.h> +#include <linux/file.h> +#include <linux/clk.h> +#include <linux/hrtimer.h> +#include <linux/module.h> +#include <linux/of.h> +#include <linux/nvhost.h> + +#include "host1x/host1x.h" +#include "nvhost_acm.h" +#include "chip_support.h" + +#define DRIVER_NAME "tegra-host1x" + +struct nvhost_master *nvhost; + +static void power_on_host(struct platform_device *dev) +{ + struct nvhost_master *host = nvhost_get_private_data(dev); + + nvhost_syncpt_reset(&host->syncpt); +} + +static int power_off_host(struct platform_device *dev) +{ + struct nvhost_master *host = nvhost_get_private_data(dev); + + nvhost_syncpt_save(&host->syncpt); + return 0; +} + +static void nvhost_free_resources(struct nvhost_master *host) +{ +} + +static int __devinit nvhost_alloc_resources(struct nvhost_master *host) +{ + int err; + + err = nvhost_init_chip_support(host); + if (err) + return err; + + return 0; +} + +static int __devinit nvhost_probe(struct platform_device *dev) +{ + struct nvhost_master *host; + struct resource *regs, *intr0, *intr1; + int i, err; + struct nvhost_device_data *pdata = + (struct nvhost_device_data *)dev->dev.platform_data; + + regs = platform_get_resource(dev, IORESOURCE_MEM, 0); + intr0 = platform_get_resource(dev, IORESOURCE_IRQ, 0); + intr1 = platform_get_resource(dev, IORESOURCE_IRQ, 1); + + if (!regs || !intr0 || !intr1) { + dev_err(&dev->dev, "missing required platform resources\n"); + return -ENXIO; + } + + host = devm_kzalloc(&dev->dev, sizeof(*host), GFP_KERNEL); + if (!host) + return -ENOMEM; + + nvhost = host; + + host->dev = dev; + + /* Copy host1x parameters. The private_data gets replaced + * by nvhost_master later */ + memcpy(&host->info, pdata->private_data, + sizeof(struct host1x_device_info)); + + pdata->finalize_poweron = power_on_host; + pdata->prepare_poweroff = power_off_host; + + pdata->pdev = dev; + + /* set common host1x device data */ + platform_set_drvdata(dev, pdata); + + /* set private host1x device data */ + nvhost_set_private_data(dev, host); + + host->aperture = devm_request_and_ioremap(&dev->dev, regs); + if (!host->aperture) { + dev_err(&dev->dev, "failed to remap host registers\n"); + err = -ENXIO; + goto fail; + } + + err = nvhost_alloc_resources(host); + if (err) { + dev_err(&dev->dev, "failed to init chip support\n"); + goto fail; + } + + err = nvhost_syncpt_init(dev, &host->syncpt); + if (err) + goto fail; + + err = nvhost_module_init(dev); + if (err) + goto fail; + + for (i = 0; i < pdata->num_clks; i++) + clk_prepare_enable(pdata->clk[i]); + nvhost_syncpt_reset(&host->syncpt); + for (i = 0; i < pdata->num_clks; i++) + clk_disable_unprepare(pdata->clk[i]); + + dev_info(&dev->dev, "initialized\n"); + + return 0; + +fail: + nvhost_free_resources(host); + kfree(host); + return err; +} + +static int __exit nvhost_remove(struct platform_device *dev) +{ + struct nvhost_master *host = nvhost_get_private_data(dev); + nvhost_syncpt_deinit(&host->syncpt); + nvhost_module_deinit(dev); + nvhost_free_resources(host); + return 0; +} + +static int nvhost_suspend(struct platform_device *dev, pm_message_t state) +{ + struct nvhost_master *host = nvhost_get_private_data(dev); + int ret = 0; + + ret = nvhost_module_suspend(host->dev); + dev_info(&dev->dev, "suspend status: %d\n", ret); + + return ret; +} + +static int nvhost_resume(struct platform_device *dev) +{ + dev_info(&dev->dev, "resuming\n"); + return 0; +} + +static struct of_device_id host1x_match[] __devinitdata = { + { .compatible = "nvidia,tegra20-host1x", }, + { .compatible = "nvidia,tegra30-host1x", }, + { }, +}; + +static struct platform_driver platform_driver = { + .probe = nvhost_probe, + .remove = __exit_p(nvhost_remove), + .suspend = nvhost_suspend, + .resume = nvhost_resume, + .driver = { + .owner = THIS_MODULE, + .name = DRIVER_NAME, + .of_match_table = of_match_ptr(host1x_match), + }, +}; + +static int __init nvhost_mod_init(void) +{ + return platform_driver_register(&platform_driver); +} + +static void __exit nvhost_mod_exit(void) +{ + platform_driver_unregister(&platform_driver); +} + +module_init(nvhost_mod_init); +module_exit(nvhost_mod_exit); + diff --git a/drivers/video/tegra/host/host1x/host1x.h b/drivers/video/tegra/host/host1x/host1x.h new file mode 100644 index 0000000..76748ac --- /dev/null +++ b/drivers/video/tegra/host/host1x/host1x.h @@ -0,0 +1,78 @@ +/* + * drivers/video/tegra/host/host1x/host1x.h + * + * Tegra host1x Driver Entrypoint + * + * Copyright (c) 2010-2012, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef __NVHOST_HOST1X_H +#define __NVHOST_HOST1X_H + +#include <linux/cdev.h> +#include <linux/nvhost.h> + +#include "nvhost_syncpt.h" + +#define TRACE_MAX_LENGTH 128U +#define IFACE_NAME "nvhost" + +struct nvhost_master { + void __iomem *aperture; + void __iomem *sync_aperture; + struct nvhost_syncpt syncpt; + struct platform_device *dev; + struct host1x_device_info info; +}; + +extern struct nvhost_master *nvhost; + +static inline void *nvhost_get_private_data(struct platform_device *_dev) +{ + struct nvhost_device_data *pdata = + (struct nvhost_device_data *)platform_get_drvdata(_dev); + WARN_ON(!pdata); + return (pdata && pdata->private_data) ? pdata->private_data : NULL; +} + +static inline void nvhost_set_private_data(struct platform_device *_dev, + void *priv_data) +{ + struct nvhost_device_data *pdata = + (struct nvhost_device_data *)platform_get_drvdata(_dev); + WARN_ON(!pdata); + if (pdata) + pdata->private_data = priv_data; +} + +static inline +struct nvhost_master *nvhost_get_host(struct platform_device *_dev) +{ + struct platform_device *pdev; + + if (_dev->dev.parent) { + pdev = to_platform_device(_dev->dev.parent); + return nvhost_get_private_data(pdev); + } else + return nvhost_get_private_data(_dev); +} + +static inline +struct platform_device *nvhost_get_parent(struct platform_device *_dev) +{ + return _dev->dev.parent ? to_platform_device(_dev->dev.parent) : NULL; +} + +#endif diff --git a/drivers/video/tegra/host/host1x/host1x01.c b/drivers/video/tegra/host/host1x/host1x01.c new file mode 100644 index 0000000..d53302d --- /dev/null +++ b/drivers/video/tegra/host/host1x/host1x01.c @@ -0,0 +1,37 @@ +/* + * drivers/video/tegra/host/host1x01.c + * + * Host1x init for T20 and T30 Architecture Chips + * + * Copyright (c) 2011-2012, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include <linux/nvhost.h> + +#include "host1x/host1x01.h" +#include "host1x/host1x.h" +#include "host1x/host1x01_hardware.h" +#include "chip_support.h" + +#include "host1x/host1x_syncpt.c" + +int nvhost_init_host1x01_support(struct nvhost_master *host, + struct nvhost_chip_support *op) +{ + host->sync_aperture = host->aperture + HOST1X_CHANNEL_SYNC_REG_BASE; + op->syncpt = host1x_syncpt_ops; + + return 0; +} diff --git a/drivers/video/tegra/host/host1x/host1x01.h b/drivers/video/tegra/host/host1x/host1x01.h new file mode 100644 index 0000000..91624d66 --- /dev/null +++ b/drivers/video/tegra/host/host1x/host1x01.h @@ -0,0 +1,29 @@ +/* + * drivers/video/tegra/host/host1x01.h + * + * Host1x init for T20 and T30 Architecture Chips + * + * Copyright (c) 2011-2012, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ +#ifndef NVHOST_HOST1X01_H +#define NVHOST_HOST1X01_H + +struct nvhost_master; +struct nvhost_chip_support; + +int nvhost_init_host1x01_support(struct nvhost_master *, + struct nvhost_chip_support *); + +#endif /* NVHOST_HOST1X01_H_ */ diff --git a/drivers/video/tegra/host/host1x/host1x01_hardware.h b/drivers/video/tegra/host/host1x/host1x01_hardware.h new file mode 100644 index 0000000..0da7e06 --- /dev/null +++ b/drivers/video/tegra/host/host1x/host1x01_hardware.h @@ -0,0 +1,36 @@ +/* + * drivers/video/tegra/host/host1x/host1x01_hardware.h + * + * Tegra host1x Register Offsets for Tegra20 and Tegra30 + * + * Copyright (c) 2010-2012 NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef __NVHOST_HOST1X01_HARDWARE_H +#define __NVHOST_HOST1X01_HARDWARE_H + +#include <linux/types.h> +#include <linux/bitops.h> +#include "hw_host1x01_sync.h" + +/* channel registers */ +#define NV_HOST1X_CHANNEL_MAP_SIZE_BYTES 16384 +#define NV_HOST1X_SYNC_MLOCK_NUM 16 + +/* sync registers */ +#define HOST1X_CHANNEL_SYNC_REG_BASE 0x3000 +#define NV_HOST1X_NB_MLOCKS 16 + +#endif diff --git a/drivers/video/tegra/host/host1x/host1x_syncpt.c b/drivers/video/tegra/host/host1x/host1x_syncpt.c new file mode 100644 index 0000000..57cc1b1 --- /dev/null +++ b/drivers/video/tegra/host/host1x/host1x_syncpt.c @@ -0,0 +1,156 @@ +/* + * drivers/video/tegra/host/host1x/host1x_syncpt.c + * + * Tegra host1x Syncpoints + * + * Copyright (c) 2010-2012, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include <linux/io.h> +#include "nvhost_syncpt.h" +#include "nvhost_acm.h" +#include "host1x.h" +#include "chip_support.h" + +/** + * Write the current syncpoint value back to hw. + */ +static void host1x_syncpt_reset(struct nvhost_syncpt *sp, u32 id) +{ + struct nvhost_master *dev = syncpt_to_dev(sp); + int min = nvhost_syncpt_read_min(sp, id); + writel(min, dev->sync_aperture + (host1x_sync_syncpt_0_r() + id * 4)); +} + +/** + * Write the current waitbase value back to hw. + */ +static void host1x_syncpt_reset_wait_base(struct nvhost_syncpt *sp, u32 id) +{ + struct nvhost_master *dev = syncpt_to_dev(sp); + writel(sp->base_val[id], + dev->sync_aperture + (host1x_sync_syncpt_base_0_r() + id * 4)); +} + +/** + * Read waitbase value from hw. + */ +static void host1x_syncpt_read_wait_base(struct nvhost_syncpt *sp, u32 id) +{ + struct nvhost_master *dev = syncpt_to_dev(sp); + sp->base_val[id] = readl(dev->sync_aperture + + (host1x_sync_syncpt_base_0_r() + id * 4)); +} + +/** + * Updates the last value read from hardware. + * (was nvhost_syncpt_update_min) + */ +static u32 host1x_syncpt_update_min(struct nvhost_syncpt *sp, u32 id) +{ + struct nvhost_master *dev = syncpt_to_dev(sp); + void __iomem *sync_regs = dev->sync_aperture; + u32 old, live; + + do { + old = nvhost_syncpt_read_min(sp, id); + live = readl(sync_regs + (host1x_sync_syncpt_0_r() + id * 4)); + } while ((u32)atomic_cmpxchg(&sp->min_val[id], old, live) != old); + + if (!nvhost_syncpt_check_max(sp, id, live)) + dev_err(&syncpt_to_dev(sp)->dev->dev, + "%s failed: id=%u, min=%d, max=%d\n", + __func__, + id, + nvhost_syncpt_read_min(sp, id), + nvhost_syncpt_read_max(sp, id)); + + return live; +} + +/** + * Write a cpu syncpoint increment to the hardware, without touching + * the cache. Caller is responsible for host being powered. + */ +static void host1x_syncpt_cpu_incr(struct nvhost_syncpt *sp, u32 id) +{ + struct nvhost_master *dev = syncpt_to_dev(sp); + u32 reg_offset = id / 32; + + if (!nvhost_module_powered(dev->dev)) { + dev_err(&syncpt_to_dev(sp)->dev->dev, + "Trying to access host1x when it's off"); + return; + } + + if (!nvhost_syncpt_client_managed(sp, id) + && nvhost_syncpt_min_eq_max(sp, id)) { + dev_err(&syncpt_to_dev(sp)->dev->dev, + "Trying to increment syncpoint id %d beyond max\n", + id); + return; + } + writel(BIT_MASK(id), dev->sync_aperture + + host1x_sync_syncpt_cpu_incr_r() + reg_offset * 4); + wmb(); +} + +static const char *host1x_syncpt_name(struct nvhost_syncpt *sp, u32 id) +{ + struct host1x_device_info *info = &syncpt_to_dev(sp)->info; + const char *name = NULL; + + if (id < info->nb_pts) + name = info->syncpt_names[id]; + + return name ? name : ""; +} + +static void host1x_syncpt_debug(struct nvhost_syncpt *sp) +{ + u32 i; + for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) { + u32 max = nvhost_syncpt_read_max(sp, i); + u32 min = nvhost_syncpt_update_min(sp, i); + if (!max && !min) + continue; + dev_info(&syncpt_to_dev(sp)->dev->dev, + "id %d (%s) min %d max %d\n", + i, syncpt_op().name(sp, i), + min, max); + + } + + for (i = 0; i < nvhost_syncpt_nb_bases(sp); i++) { + u32 base_val; + host1x_syncpt_read_wait_base(sp, i); + base_val = sp->base_val[i]; + if (base_val) + dev_info(&syncpt_to_dev(sp)->dev->dev, + "waitbase id %d val %d\n", + i, base_val); + + } +} + +static const struct nvhost_syncpt_ops host1x_syncpt_ops = { + .reset = host1x_syncpt_reset, + .reset_wait_base = host1x_syncpt_reset_wait_base, + .read_wait_base = host1x_syncpt_read_wait_base, + .update_min = host1x_syncpt_update_min, + .cpu_incr = host1x_syncpt_cpu_incr, + .debug = host1x_syncpt_debug, + .name = host1x_syncpt_name, +}; diff --git a/drivers/video/tegra/host/host1x/hw_host1x01_sync.h b/drivers/video/tegra/host/host1x/hw_host1x01_sync.h new file mode 100644 index 0000000..67f0cbf --- /dev/null +++ b/drivers/video/tegra/host/host1x/hw_host1x01_sync.h @@ -0,0 +1,398 @@ +/* + * drivers/video/tegra/host/host1x/hw_host1x_sync_host1x.h + * + * Copyright (c) 2012, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + * + */ + + /* + * Function naming determines intended use: + * + * <x>_r(void) : Returns the offset for register <x>. + * + * <x>_w(void) : Returns the word offset for word (4 byte) element <x>. + * + * <x>_<y>_s(void) : Returns size of field <y> of register <x> in bits. + * + * <x>_<y>_f(u32 v) : Returns a value based on 'v' which has been shifted + * and masked to place it at field <y> of register <x>. This value + * can be |'d with others to produce a full register value for + * register <x>. + * + * <x>_<y>_m(void) : Returns a mask for field <y> of register <x>. This + * value can be ~'d and then &'d to clear the value of field <y> for + * register <x>. + * + * <x>_<y>_<z>_f(void) : Returns the constant value <z> after being shifted + * to place it at field <y> of register <x>. This value can be |'d + * with others to produce a full register value for <x>. + * + * <x>_<y>_v(u32 r) : Returns the value of field <y> from a full register + * <x> value 'r' after being shifted to place its LSB at bit 0. + * This value is suitable for direct comparison with other unshifted + * values appropriate for use in field <y> of register <x>. + * + * <x>_<y>_<z>_v(void) : Returns the constant value for <z> defined for + * field <y> of register <x>. This value is suitable for direct + * comparison with unshifted values appropriate for use in field <y> + * of register <x>. + */ + +#ifndef __hw_host1x_sync_host1x_h__ +#define __hw_host1x_sync_host1x_h__ +/*This file is autogenerated. Do not edit. */ + +static inline u32 host1x_sync_intmask_r(void) +{ + return 0x4; +} +static inline u32 host1x_sync_intc0mask_r(void) +{ + return 0x8; +} +static inline u32 host1x_sync_hintstatus_r(void) +{ + return 0x20; +} +static inline u32 host1x_sync_hintmask_r(void) +{ + return 0x24; +} +static inline u32 host1x_sync_hintstatus_ext_r(void) +{ + return 0x28; +} +static inline u32 host1x_sync_hintstatus_ext_ip_read_int_s(void) +{ + return 1; +} +static inline u32 host1x_sync_hintstatus_ext_ip_read_int_f(u32 v) +{ + return (v & 0x1) << 30; +} +static inline u32 host1x_sync_hintstatus_ext_ip_read_int_m(void) +{ + return 0x1 << 30; +} +static inline u32 host1x_sync_hintstatus_ext_ip_read_int_v(u32 r) +{ + return (r >> 30) & 0x1; +} +static inline u32 host1x_sync_hintstatus_ext_ip_write_int_s(void) +{ + return 1; +} +static inline u32 host1x_sync_hintstatus_ext_ip_write_int_f(u32 v) +{ + return (v & 0x1) << 31; +} +static inline u32 host1x_sync_hintstatus_ext_ip_write_int_m(void) +{ + return 0x1 << 31; +} +static inline u32 host1x_sync_hintstatus_ext_ip_write_int_v(u32 r) +{ + return (r >> 31) & 0x1; +} +static inline u32 host1x_sync_hintmask_ext_r(void) +{ + return 0x2c; +} +static inline u32 host1x_sync_syncpt_thresh_cpu0_int_status_r(void) +{ + return 0x40; +} +static inline u32 host1x_sync_syncpt_thresh_cpu1_int_status_r(void) +{ + return 0x48; +} +static inline u32 host1x_sync_syncpt_thresh_int_disable_r(void) +{ + return 0x60; +} +static inline u32 host1x_sync_syncpt_thresh_int_enable_cpu0_r(void) +{ + return 0x68; +} +static inline u32 host1x_sync_cf0_setup_r(void) +{ + return 0x80; +} +static inline u32 host1x_sync_cf0_setup_cf0_base_s(void) +{ + return 9; +} +static inline u32 host1x_sync_cf0_setup_cf0_base_f(u32 v) +{ + return (v & 0x1ff) << 0; +} +static inline u32 host1x_sync_cf0_setup_cf0_base_m(void) +{ + return 0x1ff << 0; +} +static inline u32 host1x_sync_cf0_setup_cf0_base_v(u32 r) +{ + return (r >> 0) & 0x1ff; +} +static inline u32 host1x_sync_cf0_setup_cf0_limit_s(void) +{ + return 9; +} +static inline u32 host1x_sync_cf0_setup_cf0_limit_f(u32 v) +{ + return (v & 0x1ff) << 16; +} +static inline u32 host1x_sync_cf0_setup_cf0_limit_m(void) +{ + return 0x1ff << 16; +} +static inline u32 host1x_sync_cf0_setup_cf0_limit_v(u32 r) +{ + return (r >> 16) & 0x1ff; +} +static inline u32 host1x_sync_cmdproc_stop_r(void) +{ + return 0xac; +} +static inline u32 host1x_sync_ch_teardown_r(void) +{ + return 0xb0; +} +static inline u32 host1x_sync_usec_clk_r(void) +{ + return 0x1a4; +} +static inline u32 host1x_sync_ctxsw_timeout_cfg_r(void) +{ + return 0x1a8; +} +static inline u32 host1x_sync_ip_busy_timeout_r(void) +{ + return 0x1bc; +} +static inline u32 host1x_sync_ip_read_timeout_addr_r(void) +{ + return 0x1c0; +} +static inline u32 host1x_sync_ip_write_timeout_addr_r(void) +{ + return 0x1c4; +} +static inline u32 host1x_sync_mlock_0_r(void) +{ + return 0x2c0; +} +static inline u32 host1x_sync_mlock_owner_0_r(void) +{ + return 0x340; +} +static inline u32 host1x_sync_mlock_owner_0_mlock_owner_chid_0_s(void) +{ + return 4; +} +static inline u32 host1x_sync_mlock_owner_0_mlock_owner_chid_0_f(u32 v) +{ + return (v & 0xf) << 8; +} +static inline u32 host1x_sync_mlock_owner_0_mlock_owner_chid_0_m(void) +{ + return 0xf << 8; +} +static inline u32 host1x_sync_mlock_owner_0_mlock_owner_chid_0_v(u32 r) +{ + return (r >> 8) & 0xf; +} +static inline u32 host1x_sync_mlock_owner_0_mlock_cpu_owns_0_s(void) +{ + return 1; +} +static inline u32 host1x_sync_mlock_owner_0_mlock_cpu_owns_0_f(u32 v) +{ + return (v & 0x1) << 1; +} +static inline u32 host1x_sync_mlock_owner_0_mlock_cpu_owns_0_m(void) +{ + return 0x1 << 1; +} +static inline u32 host1x_sync_mlock_owner_0_mlock_cpu_owns_0_v(u32 r) +{ + return (r >> 1) & 0x1; +} +static inline u32 host1x_sync_mlock_owner_0_mlock_ch_owns_0_s(void) +{ + return 1; +} +static inline u32 host1x_sync_mlock_owner_0_mlock_ch_owns_0_f(u32 v) +{ + return (v & 0x1) << 0; +} +static inline u32 host1x_sync_mlock_owner_0_mlock_ch_owns_0_m(void) +{ + return 0x1 << 0; +} +static inline u32 host1x_sync_mlock_owner_0_mlock_ch_owns_0_v(u32 r) +{ + return (r >> 0) & 0x1; +} +static inline u32 host1x_sync_syncpt_0_r(void) +{ + return 0x400; +} +static inline u32 host1x_sync_syncpt_int_thresh_0_r(void) +{ + return 0x500; +} +static inline u32 host1x_sync_syncpt_base_0_r(void) +{ + return 0x600; +} +static inline u32 host1x_sync_syncpt_cpu_incr_r(void) +{ + return 0x700; +} +static inline u32 host1x_sync_cbread0_r(void) +{ + return 0x720; +} +static inline u32 host1x_sync_cfpeek_ctrl_r(void) +{ + return 0x74c; +} +static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_s(void) +{ + return 9; +} +static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v) +{ + return (v & 0x1ff) << 0; +} +static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_m(void) +{ + return 0x1ff << 0; +} +static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_v(u32 r) +{ + return (r >> 0) & 0x1ff; +} +static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_channr_s(void) +{ + return 3; +} +static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_channr_f(u32 v) +{ + return (v & 0x7) << 16; +} +static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_channr_m(void) +{ + return 0x7 << 16; +} +static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_channr_v(u32 r) +{ + return (r >> 16) & 0x7; +} +static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_ena_s(void) +{ + return 1; +} +static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_ena_f(u32 v) +{ + return (v & 0x1) << 31; +} +static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_ena_m(void) +{ + return 0x1 << 31; +} +static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_ena_v(u32 r) +{ + return (r >> 31) & 0x1; +} +static inline u32 host1x_sync_cfpeek_read_r(void) +{ + return 0x750; +} +static inline u32 host1x_sync_cfpeek_ptrs_r(void) +{ + return 0x754; +} +static inline u32 host1x_sync_cfpeek_ptrs_cf_rd_ptr_s(void) +{ + return 9; +} +static inline u32 host1x_sync_cfpeek_ptrs_cf_rd_ptr_f(u32 v) +{ + return (v & 0x1ff) << 0; +} +static inline u32 host1x_sync_cfpeek_ptrs_cf_rd_ptr_m(void) +{ + return 0x1ff << 0; +} +static inline u32 host1x_sync_cfpeek_ptrs_cf_rd_ptr_v(u32 r) +{ + return (r >> 0) & 0x1ff; +} +static inline u32 host1x_sync_cfpeek_ptrs_cf_wr_ptr_s(void) +{ + return 9; +} +static inline u32 host1x_sync_cfpeek_ptrs_cf_wr_ptr_f(u32 v) +{ + return (v & 0x1ff) << 16; +} +static inline u32 host1x_sync_cfpeek_ptrs_cf_wr_ptr_m(void) +{ + return 0x1ff << 16; +} +static inline u32 host1x_sync_cfpeek_ptrs_cf_wr_ptr_v(u32 r) +{ + return (r >> 16) & 0x1ff; +} +static inline u32 host1x_sync_cbstat_0_r(void) +{ + return 0x758; +} +static inline u32 host1x_sync_cbstat_0_cboffset0_s(void) +{ + return 16; +} +static inline u32 host1x_sync_cbstat_0_cboffset0_f(u32 v) +{ + return (v & 0xffff) << 0; +} +static inline u32 host1x_sync_cbstat_0_cboffset0_m(void) +{ + return 0xffff << 0; +} +static inline u32 host1x_sync_cbstat_0_cboffset0_v(u32 r) +{ + return (r >> 0) & 0xffff; +} +static inline u32 host1x_sync_cbstat_0_cbclass0_s(void) +{ + return 10; +} +static inline u32 host1x_sync_cbstat_0_cbclass0_f(u32 v) +{ + return (v & 0x3ff) << 16; +} +static inline u32 host1x_sync_cbstat_0_cbclass0_m(void) +{ + return 0x3ff << 16; +} +static inline u32 host1x_sync_cbstat_0_cbclass0_v(u32 r) +{ + return (r >> 16) & 0x3ff; +} + +#endif /* __hw_host1x_sync_host1x_h__ */ diff --git a/drivers/video/tegra/host/nvhost_acm.c b/drivers/video/tegra/host/nvhost_acm.c new file mode 100644 index 0000000..15cf395 --- /dev/null +++ b/drivers/video/tegra/host/nvhost_acm.c @@ -0,0 +1,481 @@ +/* + * drivers/video/tegra/host/nvhost_acm.c + * + * Tegra host1x Automatic Clock Management + * + * Copyright (c) 2010-2012, NVIDIA Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include <linux/module.h> +#include <linux/slab.h> +#include <linux/stat.h> +#include <linux/string.h> +#include <linux/sched.h> +#include <linux/clk.h> +#include <linux/err.h> +#include <linux/device.h> +#include <linux/delay.h> +#include <linux/platform_device.h> + +#include <mach/powergate.h> +#include <mach/clk.h> + +#include "nvhost_acm.h" + +#define ACM_SUSPEND_WAIT_FOR_IDLE_TIMEOUT (2 * HZ) +#define POWERGATE_DELAY 10 +#define MAX_DEVID_LENGTH 16 + +static void do_powergate_locked(int id) +{ + if (id != -1 && tegra_powergate_is_powered(id)) + tegra_powergate_power_off(id); +} + +static void do_unpowergate_locked(int id) +{ + if (id != -1) + tegra_powergate_power_on(id); +} + +static void to_state_clockgated_locked(struct platform_device *dev) +{ + struct nvhost_device_data *pdata = platform_get_drvdata(dev); + + if (pdata->powerstate == NVHOST_POWER_STATE_RUNNING) { + int i, err; + if (pdata->prepare_clockoff) { + err = pdata->prepare_clockoff(dev); + if (err) { + dev_err(&dev->dev, "error clock gating"); + return; + } + } + + for (i = 0; i < pdata->num_clks; i++) + clk_disable_unprepare(pdata->clk[i]); + if (dev->dev.parent) + nvhost_module_idle(to_platform_device(dev->dev.parent)); + } else if (pdata->powerstate == NVHOST_POWER_STATE_POWERGATED + && pdata->can_powergate) { + do_unpowergate_locked(pdata->powergate_ids[0]); + do_unpowergate_locked(pdata->powergate_ids[1]); + } + pdata->powerstate = NVHOST_POWER_STATE_CLOCKGATED; +} + +static void to_state_running_locked(struct platform_device *dev) +{ + struct nvhost_device_data *pdata = platform_get_drvdata(dev); + int prev_state = pdata->powerstate; + + if (pdata->powerstate == NVHOST_POWER_STATE_POWERGATED) + to_state_clockgated_locked(dev); + + if (pdata->powerstate == NVHOST_POWER_STATE_CLOCKGATED) { + int i; + + if (dev->dev.parent) + nvhost_module_busy(to_platform_device(dev->dev.parent)); + + for (i = 0; i < pdata->num_clks; i++) { + int err = clk_prepare_enable(pdata->clk[i]); + if (err) { + dev_err(&dev->dev, "Cannot turn on clock %s", + pdata->clocks[i].name); + return; + } + } + + if (pdata->finalize_clockon) + pdata->finalize_clockon(dev); + + /* Invoke callback after power un-gating. This is used for + * restoring context. */ + if (prev_state == NVHOST_POWER_STATE_POWERGATED + && pdata->finalize_poweron) + pdata->finalize_poweron(dev); + } + pdata->powerstate = NVHOST_POWER_STATE_RUNNING; +} + +/* This gets called from powergate_handler() and from module suspend. + * Module suspend is done for all modules, runtime power gating only + * for modules with can_powergate set. + */ +static int to_state_powergated_locked(struct platform_device *dev) +{ + struct nvhost_device_data *pdata = platform_get_drvdata(dev); + int err = 0; + + if (pdata->prepare_poweroff && + pdata->powerstate != NVHOST_POWER_STATE_POWERGATED) { + /* Clock needs to be on in prepare_poweroff */ + to_state_running_locked(dev); + err = pdata->prepare_poweroff(dev); + if (err) + return err; + } + + if (pdata->powerstate == NVHOST_POWER_STATE_RUNNING) + to_state_clockgated_locked(dev); + + if (pdata->can_powergate) { + do_powergate_locked(pdata->powergate_ids[0]); + do_powergate_locked(pdata->powergate_ids[1]); + } + + pdata->powerstate = NVHOST_POWER_STATE_POWERGATED; + return 0; +} + +static void schedule_powergating_locked(struct platform_device *dev) +{ + struct nvhost_device_data *pdata = platform_get_drvdata(dev); + if (pdata->can_powergate) + schedule_delayed_work(&pdata->powerstate_down, + msecs_to_jiffies(pdata->powergate_delay)); +} + +static void schedule_clockgating_locked(struct platform_device *dev) +{ + struct nvhost_device_data *pdata = platform_get_drvdata(dev); + schedule_delayed_work(&pdata->powerstate_down, + msecs_to_jiffies(pdata->clockgate_delay)); +} + +void nvhost_module_busy(struct platform_device *dev) +{ + struct nvhost_device_data *pdata = platform_get_drvdata(dev); + + if (pdata->busy) + pdata->busy(dev); + + mutex_lock(&pdata->lock); + cancel_delayed_work(&pdata->powerstate_down); + + pdata->refcount++; + if (pdata->refcount > 0 && !nvhost_module_powered(dev)) + to_state_running_locked(dev); + mutex_unlock(&pdata->lock); +} + +static void powerstate_down_handler(struct work_struct *work) +{ + struct platform_device *dev; + struct nvhost_device_data *pdata; + + pdata = container_of(to_delayed_work(work), + struct nvhost_device_data, + powerstate_down); + + dev = pdata->pdev; + + mutex_lock(&pdata->lock); + if (pdata->refcount == 0) { + switch (pdata->powerstate) { + case NVHOST_POWER_STATE_RUNNING: + to_state_clockgated_locked(dev); + schedule_powergating_locked(dev); + break; + case NVHOST_POWER_STATE_CLOCKGATED: + if (to_state_powergated_locked(dev)) + schedule_powergating_locked(dev); + break; + default: + break; + } + } + mutex_unlock(&pdata->lock); +} + +void nvhost_module_idle_mult(struct platform_device *dev, int refs) +{ + struct nvhost_device_data *pdata = platform_get_drvdata(dev); + bool kick = false; + + mutex_lock(&pdata->lock); + pdata->refcount -= refs; + if (pdata->refcount == 0) { + if (nvhost_module_powered(dev)) + schedule_clockgating_locked(dev); + kick = true; + } + mutex_unlock(&pdata->lock); + + if (kick) { + wake_up(&pdata->idle_wq); + + if (pdata->idle) + pdata->idle(dev); + } +} + +static ssize_t refcount_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + int ret; + struct nvhost_device_power_attr *power_attribute = + container_of(attr, struct nvhost_device_power_attr, + power_attr[NVHOST_POWER_SYSFS_ATTRIB_REFCOUNT]); + struct platform_device *dev = power_attribute->ndev; + struct nvhost_device_data *pdata = platform_get_drvdata(dev); + + mutex_lock(&pdata->lock); + ret = sprintf(buf, "%d\n", pdata->refcount); + mutex_unlock(&pdata->lock); + + return ret; +} + +static ssize_t powergate_delay_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + int powergate_delay = 0, ret = 0; + struct nvhost_device_power_attr *power_attribute = + container_of(attr, struct nvhost_device_power_attr, + power_attr[NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY]); + struct platform_device *dev = power_attribute->ndev; + struct nvhost_device_data *pdata = platform_get_drvdata(dev); + + if (!pdata->can_powergate) { + dev_info(&dev->dev, "does not support power-gating\n"); + return count; + } + + mutex_lock(&pdata->lock); + ret = sscanf(buf, "%d", &powergate_delay); + if (ret == 1 && powergate_delay >= 0) + pdata->powergate_delay = powergate_delay; + else + dev_err(&dev->dev, "Invalid powergate delay\n"); + mutex_unlock(&pdata->lock); + + return count; +} + +static ssize_t powergate_delay_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + int ret; + struct nvhost_device_power_attr *power_attribute = + container_of(attr, struct nvhost_device_power_attr, + power_attr[NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY]); + struct platform_device *dev = power_attribute->ndev; + struct nvhost_device_data *pdata = platform_get_drvdata(dev); + + mutex_lock(&pdata->lock); + ret = sprintf(buf, "%d\n", pdata->powergate_delay); + mutex_unlock(&pdata->lock); + + return ret; +} + +static ssize_t clockgate_delay_store(struct kobject *kobj, + struct kobj_attribute *attr, const char *buf, size_t count) +{ + int clockgate_delay = 0, ret = 0; + struct nvhost_device_power_attr *power_attribute = + container_of(attr, struct nvhost_device_power_attr, + power_attr[NVHOST_POWER_SYSFS_ATTRIB_CLOCKGATE_DELAY]); + struct platform_device *dev = power_attribute->ndev; + struct nvhost_device_data *pdata = platform_get_drvdata(dev); + + mutex_lock(&pdata->lock); + ret = sscanf(buf, "%d", &clockgate_delay); + if (ret == 1 && clockgate_delay >= 0) + pdata->clockgate_delay = clockgate_delay; + else + dev_err(&dev->dev, "Invalid clockgate delay\n"); + mutex_unlock(&pdata->lock); + + return count; +} + +static ssize_t clockgate_delay_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + int ret; + struct nvhost_device_power_attr *power_attribute = + container_of(attr, struct nvhost_device_power_attr, + power_attr[NVHOST_POWER_SYSFS_ATTRIB_CLOCKGATE_DELAY]); + struct platform_device *dev = power_attribute->ndev; + struct nvhost_device_data *pdata = platform_get_drvdata(dev); + + mutex_lock(&pdata->lock); + ret = sprintf(buf, "%d\n", pdata->clockgate_delay); + mutex_unlock(&pdata->lock); + + return ret; +} + +int nvhost_module_init(struct platform_device *dev) +{ + int i = 0, err = 0; + struct kobj_attribute *attr = NULL; + struct nvhost_device_data *pdata = platform_get_drvdata(dev); + + /* initialize clocks to known state */ + while (pdata->clocks[i].name && i < NVHOST_MODULE_MAX_CLOCKS) { + long rate = pdata->clocks[i].default_rate; + struct clk *c; + + c = devm_clk_get(&dev->dev, pdata->clocks[i].name); + if (IS_ERR_OR_NULL(c)) { + dev_err(&dev->dev, "Cannot get clock %s\n", + pdata->clocks[i].name); + return -ENODEV; + } + + rate = clk_round_rate(c, rate); + clk_prepare_enable(c); + clk_set_rate(c, rate); + clk_disable_unprepare(c); + pdata->clk[i] = c; + i++; + } + pdata->num_clks = i; + + mutex_init(&pdata->lock); + init_waitqueue_head(&pdata->idle_wq); + INIT_DELAYED_WORK(&pdata->powerstate_down, powerstate_down_handler); + + /* power gate units that we can power gate */ + if (pdata->can_powergate) { + do_powergate_locked(pdata->powergate_ids[0]); + do_powergate_locked(pdata->powergate_ids[1]); + pdata->powerstate = NVHOST_POWER_STATE_POWERGATED; + } else { + do_unpowergate_locked(pdata->powergate_ids[0]); + do_unpowergate_locked(pdata->powergate_ids[1]); + pdata->powerstate = NVHOST_POWER_STATE_CLOCKGATED; + } + + /* Init the power sysfs attributes for this device */ + pdata->power_attrib = devm_kzalloc(&dev->dev, + sizeof(struct nvhost_device_power_attr), + GFP_KERNEL); + if (!pdata->power_attrib) { + dev_err(&dev->dev, "Unable to allocate sysfs attributes\n"); + return -ENOMEM; + } + pdata->power_attrib->ndev = dev; + + pdata->power_kobj = kobject_create_and_add("acm", &dev->dev.kobj); + if (!pdata->power_kobj) { + dev_err(&dev->dev, "Could not add dir 'power'\n"); + err = -EIO; + goto fail_attrib_alloc; + } + + attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_CLOCKGATE_DELAY]; + attr->attr.name = "clockgate_delay"; + attr->attr.mode = S_IWUSR | S_IRUGO; + attr->show = clockgate_delay_show; + attr->store = clockgate_delay_store; + if (sysfs_create_file(pdata->power_kobj, &attr->attr)) { + dev_err(&dev->dev, "Could not create sysfs attribute clockgate_delay\n"); + err = -EIO; + goto fail_clockdelay; + } + + attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY]; + attr->attr.name = "powergate_delay"; + attr->attr.mode = S_IWUSR | S_IRUGO; + attr->show = powergate_delay_show; + attr->store = powergate_delay_store; + if (sysfs_create_file(pdata->power_kobj, &attr->attr)) { + dev_err(&dev->dev, "Could not create sysfs attribute powergate_delay\n"); + err = -EIO; + goto fail_powergatedelay; + } + + attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_REFCOUNT]; + attr->attr.name = "refcount"; + attr->attr.mode = S_IRUGO; + attr->show = refcount_show; + if (sysfs_create_file(pdata->power_kobj, &attr->attr)) { + dev_err(&dev->dev, "Could not create sysfs attribute refcount\n"); + err = -EIO; + goto fail_refcount; + } + + return 0; + +fail_refcount: + attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY]; + sysfs_remove_file(pdata->power_kobj, &attr->attr); + +fail_powergatedelay: + attr = &pdata->power_attrib->power_attr[NVHOST_POWER_SYSFS_ATTRIB_CLOCKGATE_DELAY]; + sysfs_remove_file(pdata->power_kobj, &attr->attr); + +fail_clockdelay: + kobject_put(pdata->power_kobj); + +fail_attrib_alloc: + kfree(pdata->power_attrib); + + return err; +} + +static int is_module_idle(struct platform_device *dev) +{ + int count; + struct nvhost_device_data *pdata = platform_get_drvdata(dev); + + mutex_lock(&pdata->lock); + count = pdata->refcount; + mutex_unlock(&pdata->lock); + + return (count == 0); +} + +int nvhost_module_suspend(struct platform_device *dev) +{ + int ret; + struct nvhost_device_data *pdata = platform_get_drvdata(dev); + + ret = wait_event_timeout(pdata->idle_wq, is_module_idle(dev), + ACM_SUSPEND_WAIT_FOR_IDLE_TIMEOUT); + if (ret == 0) { + dev_info(&dev->dev, "%s prevented suspend\n", + dev_name(&dev->dev)); + return -EBUSY; + } + + mutex_lock(&pdata->lock); + cancel_delayed_work(&pdata->powerstate_down); + to_state_powergated_locked(dev); + mutex_unlock(&pdata->lock); + + if (pdata->suspend_ndev) + pdata->suspend_ndev(dev); + + return 0; +} + +void nvhost_module_deinit(struct platform_device *dev) +{ + struct nvhost_device_data *pdata = platform_get_drvdata(dev); + + kobject_put(pdata->power_kobj); + + if (pdata->deinit) + pdata->deinit(dev); + + nvhost_module_suspend(dev); + pdata->powerstate = NVHOST_POWER_STATE_DEINIT; +} diff --git a/drivers/video/tegra/host/nvhost_acm.h b/drivers/video/tegra/host/nvhost_acm.h new file mode 100644 index 0000000..0892a57 --- /dev/null +++ b/drivers/video/tegra/host/nvhost_acm.h @@ -0,0 +1,45 @@ +/* + * drivers/video/tegra/host/nvhost_acm.h + * + * Tegra host1x Automatic Clock Management + * + * Copyright (c) 2010-2012, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef __NVHOST_ACM_H +#define __NVHOST_ACM_H + +#include <linux/nvhost.h> + +/* Sets clocks and powergating state for a module */ +int nvhost_module_init(struct platform_device *ndev); +void nvhost_module_deinit(struct platform_device *dev); +int nvhost_module_suspend(struct platform_device *dev); + +void nvhost_module_busy(struct platform_device *dev); +void nvhost_module_idle_mult(struct platform_device *dev, int refs); + +static inline bool nvhost_module_powered(struct platform_device *dev) +{ + struct nvhost_device_data *pdata = platform_get_drvdata(dev); + return pdata->powerstate == NVHOST_POWER_STATE_RUNNING; +} + +static inline void nvhost_module_idle(struct platform_device *dev) +{ + nvhost_module_idle_mult(dev, 1); +} + +#endif diff --git a/drivers/video/tegra/host/nvhost_syncpt.c b/drivers/video/tegra/host/nvhost_syncpt.c new file mode 100644 index 0000000..d7c8230 --- /dev/null +++ b/drivers/video/tegra/host/nvhost_syncpt.c @@ -0,0 +1,333 @@ +/* + * drivers/video/tegra/host/nvhost_syncpt.c + * + * Tegra host1x Syncpoints + * + * Copyright (c) 2010-2012, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include <linux/platform_device.h> +#include <linux/slab.h> +#include <linux/stat.h> +#include "nvhost_syncpt.h" +#include "nvhost_acm.h" +#include "host1x/host1x.h" +#include "chip_support.h" + +#define MAX_SYNCPT_LENGTH 5 + +/* Name of sysfs node for min and max value */ +static const char *min_name = "min"; +static const char *max_name = "max"; + +/** + * Resets syncpoint and waitbase values to sw shadows + */ +void nvhost_syncpt_reset(struct nvhost_syncpt *sp) +{ + u32 i; + + for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) + syncpt_op().reset(sp, i); + for (i = 0; i < nvhost_syncpt_nb_bases(sp); i++) + syncpt_op().reset_wait_base(sp, i); + wmb(); +} + +/** + * Updates sw shadow state for client managed registers + */ +void nvhost_syncpt_save(struct nvhost_syncpt *sp) +{ + u32 i; + + for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) { + if (nvhost_syncpt_client_managed(sp, i)) + syncpt_op().update_min(sp, i); + else + WARN_ON(!nvhost_syncpt_min_eq_max(sp, i)); + } + + for (i = 0; i < nvhost_syncpt_nb_bases(sp); i++) + syncpt_op().read_wait_base(sp, i); +} + +/** + * Updates the last value read from hardware. + */ +u32 nvhost_syncpt_update_min(struct nvhost_syncpt *sp, u32 id) +{ + u32 val; + + val = syncpt_op().update_min(sp, id); + + return val; +} + +/** + * Get the current syncpoint value + */ +u32 nvhost_syncpt_read(struct nvhost_syncpt *sp, u32 id) +{ + u32 val; + nvhost_module_busy(syncpt_to_dev(sp)->dev); + val = syncpt_op().update_min(sp, id); + nvhost_module_idle(syncpt_to_dev(sp)->dev); + return val; +} + +/** + * Get the current syncpoint base + */ +u32 nvhost_syncpt_read_wait_base(struct nvhost_syncpt *sp, u32 id) +{ + u32 val; + nvhost_module_busy(syncpt_to_dev(sp)->dev); + syncpt_op().read_wait_base(sp, id); + val = sp->base_val[id]; + nvhost_module_idle(syncpt_to_dev(sp)->dev); + return val; +} + +/** + * Write a cpu syncpoint increment to the hardware, without touching + * the cache. Caller is responsible for host being powered. + */ +void nvhost_syncpt_cpu_incr(struct nvhost_syncpt *sp, u32 id) +{ + syncpt_op().cpu_incr(sp, id); +} + +/** + * Increment syncpoint value from cpu, updating cache + */ +void nvhost_syncpt_incr(struct nvhost_syncpt *sp, u32 id) +{ + if (nvhost_syncpt_client_managed(sp, id)) + nvhost_syncpt_incr_max(sp, id, 1); + nvhost_module_busy(syncpt_to_dev(sp)->dev); + nvhost_syncpt_cpu_incr(sp, id); + nvhost_module_idle(syncpt_to_dev(sp)->dev); +} + +/** + * Returns true if syncpoint is expired, false if we may need to wait + */ +bool nvhost_syncpt_is_expired( + struct nvhost_syncpt *sp, + u32 id, + u32 thresh) +{ + u32 current_val; + u32 future_val; + smp_rmb(); + current_val = (u32)atomic_read(&sp->min_val[id]); + future_val = (u32)atomic_read(&sp->max_val[id]); + + /* Note the use of unsigned arithmetic here (mod 1<<32). + * + * c = current_val = min_val = the current value of the syncpoint. + * t = thresh = the value we are checking + * f = future_val = max_val = the value c will reach when all + * outstanding increments have completed. + * + * Note that c always chases f until it reaches f. + * + * Dtf = (f - t) + * Dtc = (c - t) + * + * Consider all cases: + * + * A) .....c..t..f..... Dtf < Dtc need to wait + * B) .....c.....f..t.. Dtf > Dtc expired + * C) ..t..c.....f..... Dtf > Dtc expired (Dct very large) + * + * Any case where f==c: always expired (for any t). Dtf == Dcf + * Any case where t==c: always expired (for any f). Dtf >= Dtc (because Dtc==0) + * Any case where t==f!=c: always wait. Dtf < Dtc (because Dtf==0, + * Dtc!=0) + * + * Other cases: + * + * A) .....t..f..c..... Dtf < Dtc need to wait + * A) .....f..c..t..... Dtf < Dtc need to wait + * A) .....f..t..c..... Dtf > Dtc expired + * + * So: + * Dtf >= Dtc implies EXPIRED (return true) + * Dtf < Dtc implies WAIT (return false) + * + * Note: If t is expired then we *cannot* wait on it. We would wait + * forever (hang the system). + * + * Note: do NOT get clever and remove the -thresh from both sides. It + * is NOT the same. + * + * If future valueis zero, we have a client managed sync point. In that + * case we do a direct comparison. + */ + if (!nvhost_syncpt_client_managed(sp, id)) + return future_val - thresh >= current_val - thresh; + else + return (s32)(current_val - thresh) >= 0; +} + +void nvhost_syncpt_debug(struct nvhost_syncpt *sp) +{ + syncpt_op().debug(sp); +} +/* Displays the current value of the sync point via sysfs */ +static ssize_t syncpt_min_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct nvhost_syncpt_attr *syncpt_attr = + container_of(attr, struct nvhost_syncpt_attr, attr); + + return snprintf(buf, PAGE_SIZE, "%u", + nvhost_syncpt_read(&syncpt_attr->host->syncpt, + syncpt_attr->id)); +} + +static ssize_t syncpt_max_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct nvhost_syncpt_attr *syncpt_attr = + container_of(attr, struct nvhost_syncpt_attr, attr); + + return snprintf(buf, PAGE_SIZE, "%u", + nvhost_syncpt_read_max(&syncpt_attr->host->syncpt, + syncpt_attr->id)); +} + +int nvhost_syncpt_init(struct platform_device *dev, + struct nvhost_syncpt *sp) +{ + int i; + struct nvhost_master *host = syncpt_to_dev(sp); + int err = 0; + + /* Allocate structs for min, max and base values */ + sp->min_val = kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_pts(sp), + GFP_KERNEL); + sp->max_val = kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_pts(sp), + GFP_KERNEL); + sp->base_val = kzalloc(sizeof(u32) * nvhost_syncpt_nb_bases(sp), + GFP_KERNEL); + sp->lock_counts = + kzalloc(sizeof(atomic_t) * nvhost_syncpt_nb_mlocks(sp), + GFP_KERNEL); + + if (!(sp->min_val && sp->max_val && sp->base_val && sp->lock_counts)) { + /* frees happen in the deinit */ + err = -ENOMEM; + goto fail; + } + + sp->kobj = kobject_create_and_add("syncpt", &dev->dev.kobj); + if (!sp->kobj) { + err = -EIO; + goto fail; + } + + /* Allocate two attributes for each sync point: min and max */ + sp->syncpt_attrs = kzalloc(sizeof(*sp->syncpt_attrs) + * nvhost_syncpt_nb_pts(sp) * 2, GFP_KERNEL); + if (!sp->syncpt_attrs) { + err = -ENOMEM; + goto fail; + } + + /* Fill in the attributes */ + for (i = 0; i < nvhost_syncpt_nb_pts(sp); i++) { + char name[MAX_SYNCPT_LENGTH]; + struct kobject *kobj; + struct nvhost_syncpt_attr *min = &sp->syncpt_attrs[i*2]; + struct nvhost_syncpt_attr *max = &sp->syncpt_attrs[i*2+1]; + + /* Create one directory per sync point */ + snprintf(name, sizeof(name), "%d", i); + kobj = kobject_create_and_add(name, sp->kobj); + if (!kobj) { + err = -EIO; + goto fail; + } + + min->id = i; + min->host = host; + min->attr.attr.name = min_name; + min->attr.attr.mode = S_IRUGO; + min->attr.show = syncpt_min_show; + if (sysfs_create_file(kobj, &min->attr.attr)) { + err = -EIO; + goto fail; + } + + max->id = i; + max->host = host; + max->attr.attr.name = max_name; + max->attr.attr.mode = S_IRUGO; + max->attr.show = syncpt_max_show; + if (sysfs_create_file(kobj, &max->attr.attr)) { + err = -EIO; + goto fail; + } + } + + return err; + +fail: + nvhost_syncpt_deinit(sp); + return err; +} + +void nvhost_syncpt_deinit(struct nvhost_syncpt *sp) +{ + kobject_put(sp->kobj); + + kfree(sp->min_val); + sp->min_val = NULL; + + kfree(sp->max_val); + sp->max_val = NULL; + + kfree(sp->base_val); + sp->base_val = NULL; + + kfree(sp->lock_counts); + sp->lock_counts = 0; + + kfree(sp->syncpt_attrs); + sp->syncpt_attrs = NULL; +} + +int nvhost_syncpt_client_managed(struct nvhost_syncpt *sp, u32 id) +{ + return BIT(id) & syncpt_to_dev(sp)->info.client_managed; +} + +int nvhost_syncpt_nb_pts(struct nvhost_syncpt *sp) +{ + return syncpt_to_dev(sp)->info.nb_pts; +} + +int nvhost_syncpt_nb_bases(struct nvhost_syncpt *sp) +{ + return syncpt_to_dev(sp)->info.nb_bases; +} + +int nvhost_syncpt_nb_mlocks(struct nvhost_syncpt *sp) +{ + return syncpt_to_dev(sp)->info.nb_mlocks; +} diff --git a/drivers/video/tegra/host/nvhost_syncpt.h b/drivers/video/tegra/host/nvhost_syncpt.h new file mode 100644 index 0000000..b883442 --- /dev/null +++ b/drivers/video/tegra/host/nvhost_syncpt.h @@ -0,0 +1,136 @@ +/* + * drivers/video/tegra/host/nvhost_syncpt.h + * + * Tegra host1x Syncpoints + * + * Copyright (c) 2010-2012, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef __NVHOST_SYNCPT_H +#define __NVHOST_SYNCPT_H + +#include <linux/kernel.h> +#include <linux/sched.h> +#include <linux/nvhost.h> +#include <linux/atomic.h> + +/* host managed and invalid syncpt id */ +#define NVSYNCPT_GRAPHICS_HOST (0) + +/* Attribute struct for sysfs min and max attributes */ +struct nvhost_syncpt_attr { + struct kobj_attribute attr; + struct nvhost_master *host; + int id; +}; + +struct nvhost_syncpt { + struct kobject *kobj; + atomic_t *min_val; + atomic_t *max_val; + u32 *base_val; + atomic_t *lock_counts; + const char **syncpt_names; + struct nvhost_syncpt_attr *syncpt_attrs; +}; + +int nvhost_syncpt_init(struct platform_device *, struct nvhost_syncpt *); +void nvhost_syncpt_deinit(struct nvhost_syncpt *); + +#define syncpt_to_dev(sp) container_of(sp, struct nvhost_master, syncpt) +#define SYNCPT_CHECK_PERIOD (2 * HZ) +#define MAX_STUCK_CHECK_COUNT 15 + +/** + * Updates the value sent to hardware. + */ +static inline u32 nvhost_syncpt_incr_max(struct nvhost_syncpt *sp, + u32 id, u32 incrs) +{ + return (u32)atomic_add_return(incrs, &sp->max_val[id]); +} + +/** + * Updated the value sent to hardware. + */ +static inline u32 nvhost_syncpt_set_max(struct nvhost_syncpt *sp, + u32 id, u32 val) +{ + atomic_set(&sp->max_val[id], val); + smp_wmb(); + return val; +} + +static inline u32 nvhost_syncpt_read_max(struct nvhost_syncpt *sp, u32 id) +{ + smp_rmb(); + return (u32)atomic_read(&sp->max_val[id]); +} + +static inline u32 nvhost_syncpt_read_min(struct nvhost_syncpt *sp, u32 id) +{ + smp_rmb(); + return (u32)atomic_read(&sp->min_val[id]); +} + +int nvhost_syncpt_client_managed(struct nvhost_syncpt *sp, u32 id); +int nvhost_syncpt_nb_pts(struct nvhost_syncpt *sp); +int nvhost_syncpt_nb_bases(struct nvhost_syncpt *sp); +int nvhost_syncpt_nb_mlocks(struct nvhost_syncpt *sp); + +static inline bool nvhost_syncpt_check_max(struct nvhost_syncpt *sp, + u32 id, u32 real) +{ + u32 max; + if (nvhost_syncpt_client_managed(sp, id)) + return true; + max = nvhost_syncpt_read_max(sp, id); + return (s32)(max - real) >= 0; +} + +/** + * Returns true if syncpoint min == max + */ +static inline bool nvhost_syncpt_min_eq_max(struct nvhost_syncpt *sp, u32 id) +{ + int min, max; + smp_rmb(); + min = atomic_read(&sp->min_val[id]); + max = atomic_read(&sp->max_val[id]); + return (min == max); +} + +void nvhost_syncpt_cpu_incr(struct nvhost_syncpt *sp, u32 id); + +u32 nvhost_syncpt_update_min(struct nvhost_syncpt *sp, u32 id); +bool nvhost_syncpt_is_expired(struct nvhost_syncpt *sp, u32 id, u32 thresh); + +void nvhost_syncpt_save(struct nvhost_syncpt *sp); + +void nvhost_syncpt_reset(struct nvhost_syncpt *sp); + +u32 nvhost_syncpt_read(struct nvhost_syncpt *sp, u32 id); +u32 nvhost_syncpt_read_wait_base(struct nvhost_syncpt *sp, u32 id); + +void nvhost_syncpt_incr(struct nvhost_syncpt *sp, u32 id); + +void nvhost_syncpt_debug(struct nvhost_syncpt *sp); + +static inline int nvhost_syncpt_is_valid(struct nvhost_syncpt *sp, u32 id) +{ + return id != NVSYNCPT_INVALID && id < nvhost_syncpt_nb_pts(sp); +} + +#endif diff --git a/include/linux/nvhost.h b/include/linux/nvhost.h new file mode 100644 index 0000000..20ba2a5 --- /dev/null +++ b/include/linux/nvhost.h @@ -0,0 +1,143 @@ +/* + * include/linux/nvhost.h + * + * Tegra host1x driver + * + * Copyright (c) 2009-2012, NVIDIA Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +#ifndef __LINUX_NVHOST_H +#define __LINUX_NVHOST_H + +#include <linux/device.h> +#include <linux/types.h> +#include <linux/platform_device.h> + +struct nvhost_device_power_attr; + +#define NVHOST_MODULE_MAX_CLOCKS 3 +#define NVHOST_MODULE_MAX_POWERGATE_IDS 2 +#define NVHOST_MODULE_NO_POWERGATE_IDS .powergate_ids = {-1, -1} +#define NVHOST_DEFAULT_CLOCKGATE_DELAY .clockgate_delay = 25 +#define NVHOST_NAME_SIZE 24 +#define NVSYNCPT_INVALID (-1) + +enum nvhost_power_sysfs_attributes { + NVHOST_POWER_SYSFS_ATTRIB_CLOCKGATE_DELAY = 0, + NVHOST_POWER_SYSFS_ATTRIB_POWERGATE_DELAY, + NVHOST_POWER_SYSFS_ATTRIB_REFCOUNT, + NVHOST_POWER_SYSFS_ATTRIB_MAX +}; + +struct nvhost_clock { + char *name; + unsigned long default_rate; + int reset; +}; + +enum nvhost_device_powerstate_t { + NVHOST_POWER_STATE_DEINIT, + NVHOST_POWER_STATE_RUNNING, + NVHOST_POWER_STATE_CLOCKGATED, + NVHOST_POWER_STATE_POWERGATED +}; + +struct host1x_device_info { + int nb_channels; /* host1x: num channels supported */ + int nb_pts; /* host1x: num syncpoints supported */ + int nb_bases; /* host1x: num syncpoints supported */ + u32 client_managed; /* host1x: client managed syncpts */ + int nb_mlocks; /* host1x: number of mlocks */ + const char **syncpt_names; /* names of sync points */ +}; + +struct nvhost_device_data { + int version; /* ip version number of device */ + int id; /* Separates clients of same hw */ + int index; /* Hardware channel number */ + void __iomem *aperture; /* Iomem mapped to kernel */ + + u32 syncpts; /* Bitfield of sync points used */ + u32 modulemutexes; /* Bit field of module mutexes */ + + u32 class; /* Device class */ + bool serialize; /* Serialize submits in the channel */ + + int powergate_ids[NVHOST_MODULE_MAX_POWERGATE_IDS]; + bool can_powergate; /* True if module can be power gated */ + int clockgate_delay;/* Delay before clock gated */ + int powergate_delay;/* Delay before power gated */ + struct nvhost_clock clocks[NVHOST_MODULE_MAX_CLOCKS];/* Clock names */ + + struct delayed_work powerstate_down;/* Power state management */ + int num_clks; /* Number of clocks opened for dev */ + struct clk *clk[NVHOST_MODULE_MAX_CLOCKS]; + struct mutex lock; /* Power management lock */ + int powerstate; /* Current power state */ + int refcount; /* Number of tasks active */ + wait_queue_head_t idle_wq; /* Work queue for idle */ + + struct nvhost_channel *channel; /* Channel assigned for the module */ + struct kobject *power_kobj; /* kobj to hold power sysfs entries */ + struct nvhost_device_power_attr *power_attrib; /* sysfs attributes */ + struct dentry *debugfs; /* debugfs directory */ + + void *private_data; /* private platform data */ + struct platform_device *pdev; /* owner platform_device */ + + /* Finalize power on. Can be used for context restore. */ + void (*finalize_poweron)(struct platform_device *dev); + + /* Device is busy. */ + void (*busy)(struct platform_device *); + + /* Device is idle. */ + void (*idle)(struct platform_device *); + + /* Device is going to be suspended */ + void (*suspend_ndev)(struct platform_device *); + + /* Device is initialized */ + void (*init)(struct platform_device *dev); + + /* Device is de-initialized. */ + void (*deinit)(struct platform_device *dev); + + /* Preparing for power off. Used for context save. */ + int (*prepare_poweroff)(struct platform_device *dev); + + /* Clock gating callbacks */ + int (*prepare_clockoff)(struct platform_device *dev); + void (*finalize_clockon)(struct platform_device *dev); +}; + +struct nvhost_device_power_attr { + struct platform_device *ndev; + struct kobj_attribute power_attr[NVHOST_POWER_SYSFS_ATTRIB_MAX]; +}; + +/* public host1x power management APIs */ +bool host1x_powered(struct platform_device *dev); +void host1x_busy(struct platform_device *dev); +void host1x_idle(struct platform_device *dev); + +/* public host1x sync-point management APIs */ +u32 host1x_syncpt_incr_max(u32 id, u32 incrs); +void host1x_syncpt_incr(u32 id); +u32 host1x_syncpt_read(u32 id); + +#endif
Add nvhost, the driver for host1x. This patch adds support for reading and incrementing sync points and dynamic power management. Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com> --- drivers/video/Kconfig | 2 + drivers/video/Makefile | 2 + drivers/video/tegra/host/Kconfig | 5 + drivers/video/tegra/host/Makefile | 10 + drivers/video/tegra/host/chip_support.c | 48 ++ drivers/video/tegra/host/chip_support.h | 52 +++ drivers/video/tegra/host/dev.c | 96 ++++ drivers/video/tegra/host/host1x/Makefile | 7 + drivers/video/tegra/host/host1x/host1x.c | 204 +++++++++ drivers/video/tegra/host/host1x/host1x.h | 78 ++++ drivers/video/tegra/host/host1x/host1x01.c | 37 ++ drivers/video/tegra/host/host1x/host1x01.h | 29 ++ .../video/tegra/host/host1x/host1x01_hardware.h | 36 ++ drivers/video/tegra/host/host1x/host1x_syncpt.c | 156 +++++++ drivers/video/tegra/host/host1x/hw_host1x01_sync.h | 398 ++++++++++++++++ drivers/video/tegra/host/nvhost_acm.c | 481 ++++++++++++++++++++ drivers/video/tegra/host/nvhost_acm.h | 45 ++ drivers/video/tegra/host/nvhost_syncpt.c | 333 ++++++++++++++ drivers/video/tegra/host/nvhost_syncpt.h | 136 ++++++ include/linux/nvhost.h | 143 ++++++ 20 files changed, 2298 insertions(+) create mode 100644 drivers/video/tegra/host/Kconfig create mode 100644 drivers/video/tegra/host/Makefile create mode 100644 drivers/video/tegra/host/chip_support.c create mode 100644 drivers/video/tegra/host/chip_support.h create mode 100644 drivers/video/tegra/host/dev.c create mode 100644 drivers/video/tegra/host/host1x/Makefile create mode 100644 drivers/video/tegra/host/host1x/host1x.c create mode 100644 drivers/video/tegra/host/host1x/host1x.h create mode 100644 drivers/video/tegra/host/host1x/host1x01.c create mode 100644 drivers/video/tegra/host/host1x/host1x01.h create mode 100644 drivers/video/tegra/host/host1x/host1x01_hardware.h create mode 100644 drivers/video/tegra/host/host1x/host1x_syncpt.c create mode 100644 drivers/video/tegra/host/host1x/hw_host1x01_sync.h create mode 100644 drivers/video/tegra/host/nvhost_acm.c create mode 100644 drivers/video/tegra/host/nvhost_acm.h create mode 100644 drivers/video/tegra/host/nvhost_syncpt.c create mode 100644 drivers/video/tegra/host/nvhost_syncpt.h create mode 100644 include/linux/nvhost.h