diff mbox

[2/9] pm: domains: avoid potential oops in pm_genpd_remove_device()

Message ID E1YW7sq-0003m6-89@rmk-PC.arm.linux.org.uk (mailing list archive)
State RFC, archived
Headers show

Commit Message

Russell King March 12, 2015, 6:31 p.m. UTC
pm_genpd_remove_device() should only be called with valid and present
pm domain.  There are circumstances where we may end up with something
that isn't a generic PM domain in dev->pm_domain (eg, vga_switcheroo
stuff.)

Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 drivers/base/power/domain.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ulf Hansson March 13, 2015, 8:56 a.m. UTC | #1
On 12 March 2015 at 19:31, Russell King <rmk+kernel@arm.linux.org.uk> wrote:
> pm_genpd_remove_device() should only be called with valid and present
> pm domain.  There are circumstances where we may end up with something
> that isn't a generic PM domain in dev->pm_domain (eg, vga_switcheroo
> stuff.)

Could the "vga_switcheroo" code deal with this instead?

>
> Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> ---
>  drivers/base/power/domain.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index b3fbc21da2dc..11a1023fa64a 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -1509,7 +1509,7 @@ int pm_genpd_remove_device(struct generic_pm_domain *genpd,
>
>         dev_dbg(dev, "%s()\n", __func__);
>
> -       if (IS_ERR_OR_NULL(genpd) || IS_ERR_OR_NULL(dev)
> +       if (!pm_genpd_present(genpd) || IS_ERR_OR_NULL(dev)

The are two issues with this approach.

1. pm_genpd_present() is build only for CONFIG_PM_SLEEP set.
2. pm_genpd_present() totally ignores holding the mutex which protects
the list of GPDs.

>             ||  IS_ERR_OR_NULL(dev->pm_domain)
>             ||  pd_to_genpd(dev->pm_domain) != genpd)
>                 return -EINVAL;
> --
> 1.8.3.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux March 13, 2015, 9:20 a.m. UTC | #2
On Fri, Mar 13, 2015 at 09:56:02AM +0100, Ulf Hansson wrote:
> On 12 March 2015 at 19:31, Russell King <rmk+kernel@arm.linux.org.uk> wrote:
> > pm_genpd_remove_device() should only be called with valid and present
> > pm domain.  There are circumstances where we may end up with something
> > that isn't a generic PM domain in dev->pm_domain (eg, vga_switcheroo
> > stuff.)
> 
> Could the "vga_switcheroo" code deal with this instead?

How is there any possibility what so ever that vga_switcherroo could
deal with this?

The problem is if something which isn't a generic PM domain is registered
in dev->pm_domain, pm_genpd_remove_device() will treat it as a generic PM
domain, and try and de-reference it as if it were a generic PM domain.

The problem is the generic PM domain code _assuming_ that whatever it
finds in dev->pm_domain is always a generic PM domain.  That is a broken
assumption.

> 
> >
> > Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net>
> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> > ---
> >  drivers/base/power/domain.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > index b3fbc21da2dc..11a1023fa64a 100644
> > --- a/drivers/base/power/domain.c
> > +++ b/drivers/base/power/domain.c
> > @@ -1509,7 +1509,7 @@ int pm_genpd_remove_device(struct generic_pm_domain *genpd,
> >
> >         dev_dbg(dev, "%s()\n", __func__);
> >
> > -       if (IS_ERR_OR_NULL(genpd) || IS_ERR_OR_NULL(dev)
> > +       if (!pm_genpd_present(genpd) || IS_ERR_OR_NULL(dev)
> 
> The are two issues with this approach.
> 
> 1. pm_genpd_present() is build only for CONFIG_PM_SLEEP set.
> 2. pm_genpd_present() totally ignores holding the mutex which protects
> the list of GPDs.

Okay, I'll fix both of those by making it always available and by
taking the appropriate lock, and removing the duplication in
genpd_dev_pm_detach().
Geert Uytterhoeven March 13, 2015, 12:45 p.m. UTC | #3
On Fri, Mar 13, 2015 at 10:20 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Fri, Mar 13, 2015 at 09:56:02AM +0100, Ulf Hansson wrote:
>> On 12 March 2015 at 19:31, Russell King <rmk+kernel@arm.linux.org.uk> wrote:
>> > pm_genpd_remove_device() should only be called with valid and present
>> > pm domain.  There are circumstances where we may end up with something
>> > that isn't a generic PM domain in dev->pm_domain (eg, vga_switcheroo
>> > stuff.)
>>
>> Could the "vga_switcheroo" code deal with this instead?
>
> How is there any possibility what so ever that vga_switcherroo could
> deal with this?
>
> The problem is if something which isn't a generic PM domain is registered
> in dev->pm_domain, pm_genpd_remove_device() will treat it as a generic PM
> domain, and try and de-reference it as if it were a generic PM domain.
>
> The problem is the generic PM domain code _assuming_ that whatever it
> finds in dev->pm_domain is always a generic PM domain.  That is a broken
> assumption.

Indeed, currently it's not possible to know which derivative of dev_pm_domain
is used. For genpd, you have to look it up in the genpd list.
This also complicates adding debug code.
In addition, it can be statically or dynamically allocated. Hence sometimes %ps
can tell you the type, sometimes it can't.

Should we add a type field to struct dev_pm_domain?

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux March 13, 2015, 1:23 p.m. UTC | #4
On Fri, Mar 13, 2015 at 09:56:02AM +0100, Ulf Hansson wrote:
> On 12 March 2015 at 19:31, Russell King <rmk+kernel@arm.linux.org.uk> wrote:
> > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > index b3fbc21da2dc..11a1023fa64a 100644
> > --- a/drivers/base/power/domain.c
> > +++ b/drivers/base/power/domain.c
> > @@ -1509,7 +1509,7 @@ int pm_genpd_remove_device(struct generic_pm_domain *genpd,
> >
> >         dev_dbg(dev, "%s()\n", __func__);
> >
> > -       if (IS_ERR_OR_NULL(genpd) || IS_ERR_OR_NULL(dev)
> > +       if (!pm_genpd_present(genpd) || IS_ERR_OR_NULL(dev)
> 
> The are two issues with this approach.
> 
> 1. pm_genpd_present() is build only for CONFIG_PM_SLEEP set.
> 2. pm_genpd_present() totally ignores holding the mutex which protects
> the list of GPDs.

Well, having looked more deeply at this, the PM domain code has more
problems.

void pm_genpd_syscore_poweroff(struct device *dev)
{
        genpd_syscore_switch(dev, true);
}
EXPORT_SYMBOL_GPL(pm_genpd_syscore_poweroff);

void pm_genpd_syscore_poweron(struct device *dev)
{
        genpd_syscore_switch(dev, false);
}
EXPORT_SYMBOL_GPL(pm_genpd_syscore_poweron);

both of these functions are externally callable without holding the
gpd_list_lock mutex, which results in:

static void genpd_syscore_switch(struct device *dev, bool suspend)
{
        struct generic_pm_domain *genpd;

        genpd = dev_to_genpd(dev);
        if (!pm_genpd_present(genpd))
                return;

calling this function without holding the gpd_list_lock either.

It also seems that there's a variety of ways which the generic PM domain
code gets from a struct device to a generic_pm_domain - one is the above.
Another is having this open coded:

        mutex_lock(&gpd_list_lock);
        list_for_each_entry(gpd, &gpd_list, gpd_list_node) {
                if (&gpd->domain == dev->pm_domain) {
                        pd = gpd;
                        break;
                }
        }
        mutex_unlock(&gpd_list_lock);

This is only _not_ buggy because we only ever add to the list, never
remove from it, so we can be sure that a successfully obtained "pd"
will remain valid after the lock is dropped.

Another is:

        genpd = dev_to_genpd(dev);
        if (IS_ERR(genpd))
		...

when we're sure that dev->pm_domain points at a generic PM domain (in
the various runtime PM functions.)  It should be also noted that
dev_to_genpd() is exported to the whole kernel - is that safe?

In any case, I think dev_to_genpd() at least needs some commentry before
it saying that it is only for use when we know for certain that the
device has a generic_pm_domain attached, or the PM domain power is NULL
or an error pointer.

This is turning out to be a right can of worms...
Kevin Hilman March 13, 2015, 4:33 p.m. UTC | #5
Russell King <rmk+kernel@arm.linux.org.uk> writes:

> pm_genpd_remove_device() should only be called with valid and present
> pm domain.  

Terminology nit: the patch adds a check for a valid an present genpd,
not pm domain (which I think of as dev->pm_domain.)

> There are circumstances where we may end up with something
> that isn't a generic PM domain in dev->pm_domain (eg, vga_switcheroo
> stuff.)

Wouldn't that particular case be caught by this check in pm_genpd_remove_device():

        || pd_to_genpd(dev->pm_domain) != genpd

Maybe I'm not quite following your description, but it looks like
someone is trying to call pm_genpd_remove_device() on something that was
not added with pm_genpd_add_device().  Where/how is that happening?

Kevin

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux March 13, 2015, 4:58 p.m. UTC | #6
On Fri, Mar 13, 2015 at 09:33:14AM -0700, Kevin Hilman wrote:
> Russell King <rmk+kernel@arm.linux.org.uk> writes:
> 
> > pm_genpd_remove_device() should only be called with valid and present
> > pm domain.  
> 
> Terminology nit: the patch adds a check for a valid an present genpd,
> not pm domain (which I think of as dev->pm_domain.)

Right, people are sloppy with terminology, it's my turn to be equally
sloppy :)  I'll fix it eventually (but I've just sent the next round of
patches.)  Can you re-comment on the new set so it doesn't get lost
please?  I'm not about to immediately go through another update round for
such a trivial change (which alone takes about 5 minutes due to the
number of branches and patches on top of this stuff).

> > There are circumstances where we may end up with something
> > that isn't a generic PM domain in dev->pm_domain (eg, vga_switcheroo
> > stuff.)
> 
> Wouldn't that particular case be caught by this check in pm_genpd_remove_device():
> 
>         || pd_to_genpd(dev->pm_domain) != genpd

Well, let's look at what pd_to_genpd() does:

static inline struct generic_pm_domain *pd_to_genpd(struct dev_pm_domain *pd)
{
        return container_of(pd, struct generic_pm_domain, domain);
}

Now, both pd_to_genpd() and dev_to_genpd() are available for use outside
of the genpd code.  What that means is that other code _can_ decide to
do this (and indeed I do have code which does this):

        struct generic_pm_domain *genpd = dev_to_genpd(dev);

        while (1) {
                if (pm_genpd_remove_device(genpd, dev) != -EAGAIN)
                        break;
                cond_resched();
        }

Now, as there is no way for anything outside of genpd to know whether the
return value from this is a real genpd structure or not - and
pm_genpd_remove_device() tries hard to validate the genpd passed in - but
not sufficiently to catch bad cases where dev->pm_domain is not attached
to a genpd.

What I'm doing is ensuring that the weak-but-looking-like-belt-and-braces
test in pm_genpd_remove_device() is actually sufficiently strong to catch
bad uses which may otherwise result in subtle out-of-bounds corruption or
list corruption.

Now, the alternative solution would be to have pm_genpd_lookup_name()
exported from the genpd code, so I can actually look up the genpd which
is _supposed_ to be registered against the device and use that instead.
However, that's potentially dangerous because there is no refcounting
against genpds obtained in this way.

So I decided better validation was the lesser of the evils.
Rafael J. Wysocki March 14, 2015, 1:27 a.m. UTC | #7
On Friday, March 13, 2015 01:45:52 PM Geert Uytterhoeven wrote:
> On Fri, Mar 13, 2015 at 10:20 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Fri, Mar 13, 2015 at 09:56:02AM +0100, Ulf Hansson wrote:
> >> On 12 March 2015 at 19:31, Russell King <rmk+kernel@arm.linux.org.uk> wrote:
> >> > pm_genpd_remove_device() should only be called with valid and present
> >> > pm domain.  There are circumstances where we may end up with something
> >> > that isn't a generic PM domain in dev->pm_domain (eg, vga_switcheroo
> >> > stuff.)
> >>
> >> Could the "vga_switcheroo" code deal with this instead?
> >
> > How is there any possibility what so ever that vga_switcherroo could
> > deal with this?
> >
> > The problem is if something which isn't a generic PM domain is registered
> > in dev->pm_domain, pm_genpd_remove_device() will treat it as a generic PM
> > domain, and try and de-reference it as if it were a generic PM domain.
> >
> > The problem is the generic PM domain code _assuming_ that whatever it
> > finds in dev->pm_domain is always a generic PM domain.  That is a broken
> > assumption.
> 
> Indeed, currently it's not possible to know which derivative of dev_pm_domain
> is used. For genpd, you have to look it up in the genpd list.
> This also complicates adding debug code.
> In addition, it can be statically or dynamically allocated. Hence sometimes %ps
> can tell you the type, sometimes it can't.
> 
> Should we add a type field to struct dev_pm_domain?

Then we'd have to keep track of all dev_pm_domain derivatives somehow and I'd
prefer to avoid that if possible.
diff mbox

Patch

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index b3fbc21da2dc..11a1023fa64a 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -1509,7 +1509,7 @@  int pm_genpd_remove_device(struct generic_pm_domain *genpd,
 
 	dev_dbg(dev, "%s()\n", __func__);
 
-	if (IS_ERR_OR_NULL(genpd) || IS_ERR_OR_NULL(dev)
+	if (!pm_genpd_present(genpd) || IS_ERR_OR_NULL(dev)
 	    ||  IS_ERR_OR_NULL(dev->pm_domain)
 	    ||  pd_to_genpd(dev->pm_domain) != genpd)
 		return -EINVAL;