Message ID | 1379510692-32435-9-git-send-email-treding@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 18 Sep 2013 15:24:50 +0200, Thierry Reding <thierry.reding@gmail.com> wrote: > Interrupt references are currently resolved very early (when a device is > created). This has the disadvantage that it will fail in cases where the > interrupt parent hasn't been probed and no IRQ domain for it has been > registered yet. To work around that various drivers use explicit > initcall ordering to force interrupt parents to be probed before devices > that need them are created. That's error prone and doesn't always work. > If a platform device uses an interrupt line connected to a different > platform device (such as a GPIO controller), both will be created in the > same batch, and the GPIO controller won't have been probed by its driver > when the depending platform device is created. Interrupt resolution will > fail in that case. What is the reason for all the rework on the irq parsing return values? A return value of '0' is always an error on irq parsing, regardless of architecture even if NO_IRQ is defined as -1. I may have missed it, but I don't see any checking for specific error values in the return paths of the functions. If the specific return value isn't required (and I don't think it is), then you can simplify the whole series by getting rid of the rework patches. g. > > Another common workaround is for drivers to explicitly resolve interrupt > references at probe time. This is suboptimal, however, because it will > require every driver to duplicate the code. > > This patch adds support for late interrupt resolution to the platform > driver core, by resolving the references right before a device driver's > .probe() function will be called. This not only delays the resolution > until a much later time (giving interrupt parents a better chance of > being probed in the meantime), but it also allows the platform driver > core to queue the device for deferred probing if the interrupt parent > hasn't registered its IRQ domain yet. > > Signed-off-by: Thierry Reding <treding@nvidia.com> > --- > Changes in v2: > - split off IRQ parsing into separate function to make code flow simpler > - add comments to point out some aspects of the implementation > - make code idempotent (as pointed out by Grygorii Strashko > > drivers/base/platform.c | 4 ++ > drivers/of/platform.c | 107 +++++++++++++++++++++++++++++++++++++++++--- > include/linux/of_platform.h | 7 +++ > 3 files changed, 112 insertions(+), 6 deletions(-) > > diff --git a/drivers/base/platform.c b/drivers/base/platform.c > index 4f8bef3..8dcf835 100644 > --- a/drivers/base/platform.c > +++ b/drivers/base/platform.c > @@ -481,6 +481,10 @@ static int platform_drv_probe(struct device *_dev) > struct platform_device *dev = to_platform_device(_dev); > int ret; > > + ret = of_platform_probe(dev); > + if (ret) > + return ret; > + > if (ACPI_HANDLE(_dev)) > acpi_dev_pm_attach(_dev, true); > > diff --git a/drivers/of/platform.c b/drivers/of/platform.c > index 9b439ac..df6d56e 100644 > --- a/drivers/of/platform.c > +++ b/drivers/of/platform.c > @@ -142,7 +142,7 @@ struct platform_device *of_device_alloc(struct device_node *np, > struct device *parent) > { > struct platform_device *dev; > - int rc, i, num_reg = 0, num_irq; > + int rc, i, num_reg = 0; > struct resource *res, temp_res; > > dev = platform_device_alloc("", -1); > @@ -153,23 +153,21 @@ struct platform_device *of_device_alloc(struct device_node *np, > if (of_can_translate_address(np)) > while (of_address_to_resource(np, num_reg, &temp_res) == 0) > num_reg++; > - num_irq = of_irq_count(np); > > /* Populate the resource table */ > - if (num_irq || num_reg) { > - res = kzalloc(sizeof(*res) * (num_irq + num_reg), GFP_KERNEL); > + if (num_reg) { > + res = kzalloc(sizeof(*res) * num_reg, GFP_KERNEL); > if (!res) { > platform_device_put(dev); > return NULL; > } > > - dev->num_resources = num_reg + num_irq; > + dev->num_resources = num_reg; > dev->resource = res; > for (i = 0; i < num_reg; i++, res++) { > rc = of_address_to_resource(np, i, res); > WARN_ON(rc); > } > - WARN_ON(of_irq_to_resource_table(np, res, num_irq) != num_irq); > } > > dev->dev.of_node = of_node_get(np); > @@ -490,4 +488,101 @@ int of_platform_populate(struct device_node *root, > return rc; > } > EXPORT_SYMBOL_GPL(of_platform_populate); > + > +/** > + * of_platform_parse_irq() - parse interrupt resource from device node > + * @pdev: pointer to platform device > + * > + * Returns 0 on success or a negative error code on failure. > + */ > +static int of_platform_parse_irq(struct platform_device *pdev) > +{ > + struct device_node *np = pdev->dev.of_node; > + unsigned int num_res = pdev->num_resources; > + struct resource *res = pdev->resource; > + unsigned int num_irq, num, c; > + int ret = 0; > + > + num_irq = of_irq_count(pdev->dev.of_node); > + if (!num_irq) > + return 0; > + > + /* > + * Deferred probing may cause this function to be called multiple > + * times, so check if all interrupts have been parsed already and > + * return early. > + */ > + for (c = 0; c < num_irq; c++) > + if (platform_get_irq(pdev, c) < 0) > + break; > + > + if (c == num_irq) > + return 0; > + > + num = num_res + num_irq; > + > + /* > + * Note that in case we're called twice on the same device (due to > + * deferred probing for example) this will simply be a nop because > + * krealloc() returns the input pointer if the size of the memory > + * block that it points to is larger than or equal to the new size > + * being requested. > + */ > + res = krealloc(res, num * sizeof(*res), GFP_KERNEL); > + if (!res) > + return -ENOMEM; > + > + pdev->resource = res; > + res += num_res; > + > + /* > + * It is possible for this to fail. If so, not that the number of > + * resources is not updated, so that the next call to this function > + * will parse all interrupts again. Otherwise we can't keep track of > + * how many we've parsed so far. > + */ > + ret = of_irq_to_resource_table(np, res, num_irq); > + if (ret < 0) > + return ret; > + > + /* > + * All interrupts are guaranteed to have been parsed and stored in > + * the resource table, so the number of resources can now safely be > + * updated. > + */ > + pdev->num_resources += num_irq; > + > + return 0; > +} > + > +/** > + * of_platform_probe() - OF specific initialization at probe time > + * @pdev: pointer to a platform device > + * > + * This function is called by the driver core to perform devicetree-specific > + * setup for a given platform device at probe time. If a device's resources > + * as specified in the device tree are not available yet, this function can > + * return -EPROBE_DEFER and cause the device to be probed again later, when > + * other drivers that potentially provide the missing resources have been > + * probed in turn. > + * > + * Note that because of the above, all code executed by this function must > + * be prepared to be run multiple times on the same device (i.e. it must be > + * idempotent). > + * > + * Returns 0 on success or a negative error code on failure. > + */ > +int of_platform_probe(struct platform_device *pdev) > +{ > + int ret; > + > + if (!pdev->dev.of_node) > + return 0; > + > + ret = of_platform_parse_irq(pdev); > + if (ret < 0) > + return ret; > + > + return 0; > +} > #endif /* CONFIG_OF_ADDRESS */ > diff --git a/include/linux/of_platform.h b/include/linux/of_platform.h > index 05cb4a9..92fc4f6 100644 > --- a/include/linux/of_platform.h > +++ b/include/linux/of_platform.h > @@ -72,6 +72,8 @@ extern int of_platform_populate(struct device_node *root, > const struct of_device_id *matches, > const struct of_dev_auxdata *lookup, > struct device *parent); > + > +extern int of_platform_probe(struct platform_device *pdev); > #else > static inline int of_platform_populate(struct device_node *root, > const struct of_device_id *matches, > @@ -80,6 +82,11 @@ static inline int of_platform_populate(struct device_node *root, > { > return -ENODEV; > } > + > +static inline int of_platform_probe(struct platform_device *pdev) > +{ > + return 0; > +} > #endif > > #endif /* _LINUX_OF_PLATFORM_H */ > -- > 1.8.4 >
On Wed, Oct 16, 2013 at 12:24:36AM +0100, Grant Likely wrote: > On Wed, 18 Sep 2013 15:24:50 +0200, Thierry Reding <thierry.reding@gmail.com> wrote: > > Interrupt references are currently resolved very early (when a device is > > created). This has the disadvantage that it will fail in cases where the > > interrupt parent hasn't been probed and no IRQ domain for it has been > > registered yet. To work around that various drivers use explicit > > initcall ordering to force interrupt parents to be probed before devices > > that need them are created. That's error prone and doesn't always work. > > If a platform device uses an interrupt line connected to a different > > platform device (such as a GPIO controller), both will be created in the > > same batch, and the GPIO controller won't have been probed by its driver > > when the depending platform device is created. Interrupt resolution will > > fail in that case. > > What is the reason for all the rework on the irq parsing return values? > A return value of '0' is always an error on irq parsing, regardless of > architecture even if NO_IRQ is defined as -1. I may have missed it, but > I don't see any checking for specific error values in the return paths > of the functions. > > If the specific return value isn't required (and I don't think it is), > then you can simplify the whole series by getting rid of the rework > patches. The whole reason for this patch set is to propagate the precise error code so that when one of the top-level OF IRQ functions is called (such as irq_of_parse_and_map()) the caller can actually make an reasonable choice on how to handle the error. More precisely, the goal of this series was to propagate failure to create a mapping, due to an IRQ domain not having been registered yet for the device node passed into irq_create_of_mapping(), back to the caller, irq_of_parse_and_map(), which can then propagate it further. Ultimately this will allow driver probing to fail with EPROBE_DEFER when IRQ mapping fails and allow deferred probing to be triggered. This cannot be done if all you have as error status is 0. Mapping of IRQs can fail for a number of reasons, such as when an IRQ descriptor cannot be allocated or when an IRQ domain's .xlate() fails. You don't want to be deferring probe on all errors because some of them are genuinely fatal and cannot be recovered from by deferring probe. With the current implementation in the kernel, interrupt references are resolved very early, usually when a device is instantiated from the device tree. So unless all interrupt parents of all devices have been probed by that time (which usually can only be done using explicit initcall ordering, and even in that case doesn't always work) then many devices will end up with an invalid interrupt number. The typical case where this can happen is if you have a GPIO expander on an I2C bus that provides interrupt services to other devices. With the current implementation, the GPIO expander will be probed fairly late, at which point many of its users will already have been instantiated and assigned an invalid interrupt. Many drivers try to work around that by explicitly calling irq_of_parse_and_map() within their .probe() function because that's usually called sometime after the device's instantiation. However even that isn't guaranteed to work. If the GPIO expander depends itself on other resources that cause it to require deferred probing, or if its driver is built as a module and therefore making the registration of the corresponding IRQ domain is completely non-deterministic, then this can fail just as easily. With this patch series all of these issues should go away. All of the dependencies should be resolvable by using deferred probing. Furthermore the mechanism introduced to have the core resolve the IRQ references can be used to request other standard resources as well. A particular one that I'm aware of is how IOMMUs are associated with devices. Currently a variety of quirks have been proposed to work around these issues, such as reordering nodes in the device tree, which only work because the DTC implementation that everybody uses happens to keep them ordered in the same way in the DTB as they were in the DTS. Thierry
On Wed, 16 Oct 2013 00:24:36 +0100, Grant Likely <grant.likely@linaro.org> wrote: > On Wed, 18 Sep 2013 15:24:50 +0200, Thierry Reding <thierry.reding@gmail.com> wrote: > > Interrupt references are currently resolved very early (when a device is > > created). This has the disadvantage that it will fail in cases where the > > interrupt parent hasn't been probed and no IRQ domain for it has been > > registered yet. To work around that various drivers use explicit > > initcall ordering to force interrupt parents to be probed before devices > > that need them are created. That's error prone and doesn't always work. > > If a platform device uses an interrupt line connected to a different > > platform device (such as a GPIO controller), both will be created in the > > same batch, and the GPIO controller won't have been probed by its driver > > when the depending platform device is created. Interrupt resolution will > > fail in that case. > > What is the reason for all the rework on the irq parsing return values? > A return value of '0' is always an error on irq parsing, regardless of > architecture even if NO_IRQ is defined as -1. I may have missed it, but > I don't see any checking for specific error values in the return paths > of the functions. > > If the specific return value isn't required (and I don't think it is), > then you can simplify the whole series by getting rid of the rework > patches. I've not heard back about the above, but I've just had a conversation with Rob about what to do here. The problem that I have is that it makes a specific return code need to traverse several levels of function calls and have a meaning come out the other end. It becomes difficult to figure out where that code actually comes from when reading the code. That's more of a gut-feel reaction rather than pointing out specifics though. The other thing that makes me nervous how invasive the series is. However, even with saying all of the above, I'm not saying outright no. I want to get this feature in. It is obviously needed and I'll even merge the patches piecemeal as the look ready (I've already merged 2). Regardless, the current series needs to be reworked. It conflicts with the other IRQ rework that I've already put into my tree. The best thing to do would probably be respin it against my current tree and repost. I'll take a fresh look then.... In the mean time, anything you can do to /sanely/ reduce the impact will probably help. :-) g.
On Thu, Oct 24, 2013 at 05:37:49PM +0100, Grant Likely wrote: > On Wed, 16 Oct 2013 00:24:36 +0100, Grant Likely <grant.likely@linaro.org> wrote: > > On Wed, 18 Sep 2013 15:24:50 +0200, Thierry Reding <thierry.reding@gmail.com> wrote: > > > Interrupt references are currently resolved very early (when a device is > > > created). This has the disadvantage that it will fail in cases where the > > > interrupt parent hasn't been probed and no IRQ domain for it has been > > > registered yet. To work around that various drivers use explicit > > > initcall ordering to force interrupt parents to be probed before devices > > > that need them are created. That's error prone and doesn't always work. > > > If a platform device uses an interrupt line connected to a different > > > platform device (such as a GPIO controller), both will be created in the > > > same batch, and the GPIO controller won't have been probed by its driver > > > when the depending platform device is created. Interrupt resolution will > > > fail in that case. > > > > What is the reason for all the rework on the irq parsing return values? > > A return value of '0' is always an error on irq parsing, regardless of > > architecture even if NO_IRQ is defined as -1. I may have missed it, but > > I don't see any checking for specific error values in the return paths > > of the functions. > > > > If the specific return value isn't required (and I don't think it is), > > then you can simplify the whole series by getting rid of the rework > > patches. > > I've not heard back about the above, but I've just had a conversation > with Rob about what to do here. I thought I had sent a reply regarding this about a week ago. Perhaps it got lost. I'll resend. > The problem that I have is that it makes a specific return code need > to traverse several levels of function calls and have a meaning come > out the other end. It becomes difficult to figure out where that code > actually comes from when reading the code. That's more of a gut-feel > reaction rather than pointing out specifics though. To be honest, I'm not all that happy with that aspect myself, but at the same time I didn't feel like duplicating a lot of code to get this done more easily. I imagine that would've caused significant pushback as well. It's somewhat unfortunate that we have to propagate back through several level, but that's just the way the code is currently written and I don't think we can really get the information (EPROBE_DEFER) from any other place but from the lowest level. > The other thing that makes me nervous how invasive the series is. Well, I guess that comes with the territory, doesn't it? Interrupts are used in a large number of places and they have been used in a very static manner so far. The end result of this patch series is that for most devices instantiated from the device tree interrupts end up in the same category as any other resources such as GPIOs, regulators or clocks. They become mostly dynamic. That in itself is a big change, so I don't think it's all that surprising that the required changes are invasive. And I think if we really want to solve it properly we need to make even more invasive changes. For example, Grygorii pointed out that we could have a setup in the future where the following happens: 1) driver providing interrupts is probed 2) driver using interrupts is probed, interrupt references are resolved at probe time 3) both drivers are unloaded 4) both drivers are reloaded In that case with the current set of patches the added core code assumes that the interrupts have already been resolved and are still valid. Possibly the easiest way to fix that would be to just zero out the interrupt resources on remove so that they can be re-resolved on next probe. But that's somewhat cumbersome and it seems to me like a better fix might be to go and change struct platform_device to not use a single array of resources, but rather a list, or perhaps an array per type of resource. The current platform_device structure is simple and easy, but it doesn't work well with all the new dynamicity that we want/need/have today. Obviously modifying the innards of struct platform_device will likely turn out to be a mammoth task of its own, but if that's what it takes I'm prepared to do that as well. Or at least even try. > However, even with saying all of the above, I'm not saying outright no. > I want to get this feature in. That's good to hear. Last time we talked about it we seemed to have an agreement that this needs to be done, but you not replying had me worried that perhaps you had changed your mind. It seems you've been busy trying to address other issues that maybe are even more pressing so I can hardly complain. =) I'm good as long as we can keep moving in the right direction. > It is obviously needed and I'll even merge the patches piecemeal as the > look ready (I've already merged 2). Regardless, the current series needs > to be reworked. It conflicts with the other IRQ rework that I've already > put into my tree. The best thing to do would probably be respin it > against my current tree and repost. Sure, that won't be a problem. I might not get to it immediately, but I'll get back to it. > I'll take a fresh look then.... In the mean time, anything you can do to > /sanely/ reduce the impact will probably help. :-) I might be able to do that. But I'll mention that in another thread in the right context. Thierry
diff --git a/drivers/base/platform.c b/drivers/base/platform.c index 4f8bef3..8dcf835 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -481,6 +481,10 @@ static int platform_drv_probe(struct device *_dev) struct platform_device *dev = to_platform_device(_dev); int ret; + ret = of_platform_probe(dev); + if (ret) + return ret; + if (ACPI_HANDLE(_dev)) acpi_dev_pm_attach(_dev, true); diff --git a/drivers/of/platform.c b/drivers/of/platform.c index 9b439ac..df6d56e 100644 --- a/drivers/of/platform.c +++ b/drivers/of/platform.c @@ -142,7 +142,7 @@ struct platform_device *of_device_alloc(struct device_node *np, struct device *parent) { struct platform_device *dev; - int rc, i, num_reg = 0, num_irq; + int rc, i, num_reg = 0; struct resource *res, temp_res; dev = platform_device_alloc("", -1); @@ -153,23 +153,21 @@ struct platform_device *of_device_alloc(struct device_node *np, if (of_can_translate_address(np)) while (of_address_to_resource(np, num_reg, &temp_res) == 0) num_reg++; - num_irq = of_irq_count(np); /* Populate the resource table */ - if (num_irq || num_reg) { - res = kzalloc(sizeof(*res) * (num_irq + num_reg), GFP_KERNEL); + if (num_reg) { + res = kzalloc(sizeof(*res) * num_reg, GFP_KERNEL); if (!res) { platform_device_put(dev); return NULL; } - dev->num_resources = num_reg + num_irq; + dev->num_resources = num_reg; dev->resource = res; for (i = 0; i < num_reg; i++, res++) { rc = of_address_to_resource(np, i, res); WARN_ON(rc); } - WARN_ON(of_irq_to_resource_table(np, res, num_irq) != num_irq); } dev->dev.of_node = of_node_get(np); @@ -490,4 +488,101 @@ int of_platform_populate(struct device_node *root, return rc; } EXPORT_SYMBOL_GPL(of_platform_populate); + +/** + * of_platform_parse_irq() - parse interrupt resource from device node + * @pdev: pointer to platform device + * + * Returns 0 on success or a negative error code on failure. + */ +static int of_platform_parse_irq(struct platform_device *pdev) +{ + struct device_node *np = pdev->dev.of_node; + unsigned int num_res = pdev->num_resources; + struct resource *res = pdev->resource; + unsigned int num_irq, num, c; + int ret = 0; + + num_irq = of_irq_count(pdev->dev.of_node); + if (!num_irq) + return 0; + + /* + * Deferred probing may cause this function to be called multiple + * times, so check if all interrupts have been parsed already and + * return early. + */ + for (c = 0; c < num_irq; c++) + if (platform_get_irq(pdev, c) < 0) + break; + + if (c == num_irq) + return 0; + + num = num_res + num_irq; + + /* + * Note that in case we're called twice on the same device (due to + * deferred probing for example) this will simply be a nop because + * krealloc() returns the input pointer if the size of the memory + * block that it points to is larger than or equal to the new size + * being requested. + */ + res = krealloc(res, num * sizeof(*res), GFP_KERNEL); + if (!res) + return -ENOMEM; + + pdev->resource = res; + res += num_res; + + /* + * It is possible for this to fail. If so, not that the number of + * resources is not updated, so that the next call to this function + * will parse all interrupts again. Otherwise we can't keep track of + * how many we've parsed so far. + */ + ret = of_irq_to_resource_table(np, res, num_irq); + if (ret < 0) + return ret; + + /* + * All interrupts are guaranteed to have been parsed and stored in + * the resource table, so the number of resources can now safely be + * updated. + */ + pdev->num_resources += num_irq; + + return 0; +} + +/** + * of_platform_probe() - OF specific initialization at probe time + * @pdev: pointer to a platform device + * + * This function is called by the driver core to perform devicetree-specific + * setup for a given platform device at probe time. If a device's resources + * as specified in the device tree are not available yet, this function can + * return -EPROBE_DEFER and cause the device to be probed again later, when + * other drivers that potentially provide the missing resources have been + * probed in turn. + * + * Note that because of the above, all code executed by this function must + * be prepared to be run multiple times on the same device (i.e. it must be + * idempotent). + * + * Returns 0 on success or a negative error code on failure. + */ +int of_platform_probe(struct platform_device *pdev) +{ + int ret; + + if (!pdev->dev.of_node) + return 0; + + ret = of_platform_parse_irq(pdev); + if (ret < 0) + return ret; + + return 0; +} #endif /* CONFIG_OF_ADDRESS */ diff --git a/include/linux/of_platform.h b/include/linux/of_platform.h index 05cb4a9..92fc4f6 100644 --- a/include/linux/of_platform.h +++ b/include/linux/of_platform.h @@ -72,6 +72,8 @@ extern int of_platform_populate(struct device_node *root, const struct of_device_id *matches, const struct of_dev_auxdata *lookup, struct device *parent); + +extern int of_platform_probe(struct platform_device *pdev); #else static inline int of_platform_populate(struct device_node *root, const struct of_device_id *matches, @@ -80,6 +82,11 @@ static inline int of_platform_populate(struct device_node *root, { return -ENODEV; } + +static inline int of_platform_probe(struct platform_device *pdev) +{ + return 0; +} #endif #endif /* _LINUX_OF_PLATFORM_H */
Interrupt references are currently resolved very early (when a device is created). This has the disadvantage that it will fail in cases where the interrupt parent hasn't been probed and no IRQ domain for it has been registered yet. To work around that various drivers use explicit initcall ordering to force interrupt parents to be probed before devices that need them are created. That's error prone and doesn't always work. If a platform device uses an interrupt line connected to a different platform device (such as a GPIO controller), both will be created in the same batch, and the GPIO controller won't have been probed by its driver when the depending platform device is created. Interrupt resolution will fail in that case. Another common workaround is for drivers to explicitly resolve interrupt references at probe time. This is suboptimal, however, because it will require every driver to duplicate the code. This patch adds support for late interrupt resolution to the platform driver core, by resolving the references right before a device driver's .probe() function will be called. This not only delays the resolution until a much later time (giving interrupt parents a better chance of being probed in the meantime), but it also allows the platform driver core to queue the device for deferred probing if the interrupt parent hasn't registered its IRQ domain yet. Signed-off-by: Thierry Reding <treding@nvidia.com> --- Changes in v2: - split off IRQ parsing into separate function to make code flow simpler - add comments to point out some aspects of the implementation - make code idempotent (as pointed out by Grygorii Strashko drivers/base/platform.c | 4 ++ drivers/of/platform.c | 107 +++++++++++++++++++++++++++++++++++++++++--- include/linux/of_platform.h | 7 +++ 3 files changed, 112 insertions(+), 6 deletions(-)