[1/2] drm: Fix memory leak at error path of drm_read()

Message ID	1417690603-1595-1-git-send-email-tiwai@suse.de (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Takashi Iwai <tiwai@suse.de> To: dri-devel@lists.freedesktop.org Subject: [PATCH 1/2] drm: Fix memory leak at error path of drm_read() Date: Thu, 4 Dec 2014 11:56:42 +0100 Message-Id: <1417690603-1595-1-git-send-email-tiwai@suse.de> Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Takashi Iwai Dec. 4, 2014, 10:56 a.m. UTC

Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 drivers/gpu/drm/drm_fops.c | 1 +
 1 file changed, 1 insertion(+)

Daniel Vetter Dec. 4, 2014, 11:46 a.m. UTC | #1

On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote:
> Signed-off-by: Takashi Iwai <tiwai@suse.de>

Applied to my drm-misc branch. The second patch seems to be a dupe of

commit bd008e5b2953186fc0c6633a885ade95e7043800
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Oct 7 14:13:51 2014 +0100

    drm: Implement O_NONBLOCK support on /dev/dri/cardN

so maybe we should backport that one once it's landed in 3.19?
-Daniel

> ---
>  drivers/gpu/drm/drm_fops.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
> index ed7bc68f7e87..a82dc28d54f3 100644
> --- a/drivers/gpu/drm/drm_fops.c
> +++ b/drivers/gpu/drm/drm_fops.c
> @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer,
>  		if (copy_to_user(buffer + total,
>  				 e->event, e->event->length)) {
>  			total = -EFAULT;
> +			e->destroy(e);
>  			break;
>  		}
>  
> -- 
> 2.1.3
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

Takashi Iwai Dec. 4, 2014, 11:50 a.m. UTC | #2

At Thu, 4 Dec 2014 12:46:16 +0100,
Daniel Vetter wrote:
> 
> On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote:
> > Signed-off-by: Takashi Iwai <tiwai@suse.de>
> 
> Applied to my drm-misc branch. The second patch seems to be a dupe of
> 
> commit bd008e5b2953186fc0c6633a885ade95e7043800
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Oct 7 14:13:51 2014 +0100
> 
>     drm: Implement O_NONBLOCK support on /dev/dri/cardN
> 
> so maybe we should backport that one once it's landed in 3.19?

Ah thanks, I misread Chris' comment as if it were for libdrm, and now
understood correctly.


Takashi

> -Daniel
> 
> > ---
> >  drivers/gpu/drm/drm_fops.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
> > index ed7bc68f7e87..a82dc28d54f3 100644
> > --- a/drivers/gpu/drm/drm_fops.c
> > +++ b/drivers/gpu/drm/drm_fops.c
> > @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer,
> >  		if (copy_to_user(buffer + total,
> >  				 e->event, e->event->length)) {
> >  			total = -EFAULT;
> > +			e->destroy(e);
> >  			break;
> >  		}
> >  
> > -- 
> > 2.1.3
> > 
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>

Chris Wilson Dec. 4, 2014, 11:51 a.m. UTC | #3

On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote:
> Signed-off-by: Takashi Iwai <tiwai@suse.de>
> ---
>  drivers/gpu/drm/drm_fops.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
> index ed7bc68f7e87..a82dc28d54f3 100644
> --- a/drivers/gpu/drm/drm_fops.c
> +++ b/drivers/gpu/drm/drm_fops.c
> @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer,
>  		if (copy_to_user(buffer + total,
>  				 e->event, e->event->length)) {
>  			total = -EFAULT;
> +			e->destroy(e);

We shouldn't just be throwing away the event here, but put the event
back at the front of the queue. Poses an interesting race issue. Seems
like we want to hold the spinlock until the copy is complete so that we
can fix up the failure correctly.
-Chris

Takashi Iwai Dec. 4, 2014, 11:51 a.m. UTC | #4

At Thu, 04 Dec 2014 12:50:04 +0100,
Takashi Iwai wrote:
> 
> At Thu, 4 Dec 2014 12:46:16 +0100,
> Daniel Vetter wrote:
> > 
> > so maybe we should backport that one once it's landed in 3.19?

Speaking about this: yes, it's worth, IMO, as this (unexpectedly)
seems really fixing the issue on the current driver.

thanks,

Takashi

Takashi Iwai Dec. 4, 2014, 11:56 a.m. UTC | #5

At Thu, 4 Dec 2014 11:51:14 +0000,
Chris Wilson wrote:
> 
> On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote:
> > Signed-off-by: Takashi Iwai <tiwai@suse.de>
> > ---
> >  drivers/gpu/drm/drm_fops.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
> > index ed7bc68f7e87..a82dc28d54f3 100644
> > --- a/drivers/gpu/drm/drm_fops.c
> > +++ b/drivers/gpu/drm/drm_fops.c
> > @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer,
> >  		if (copy_to_user(buffer + total,
> >  				 e->event, e->event->length)) {
> >  			total = -EFAULT;
> > +			e->destroy(e);
> 
> We shouldn't just be throwing away the event here, but put the event
> back at the front of the queue. Poses an interesting race issue. Seems
> like we want to hold the spinlock until the copy is complete so that we
> can fix up the failure correctly.

Yeah, I thought of it while writing this, but as a starter, I tried
the simpler one.  (And I didn't realize your comment referring to the
already existing fix in kernel, sorry!)

The problem to hold the spinlock for the whole is that you can't it
with copy_to_user().  So it'd be a bit tricky.

And, why not using event_wait.lock instead of the extra event_lock?
Then we can use wait_event_lock_*() variant that covers more race
between dequeuing and wait_event.

Takashi

Chris Wilson Dec. 4, 2014, 12:13 p.m. UTC | #6

On Thu, Dec 04, 2014 at 12:56:38PM +0100, Takashi Iwai wrote:
> At Thu, 4 Dec 2014 11:51:14 +0000,
> Chris Wilson wrote:
> > 
> > On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote:
> > > Signed-off-by: Takashi Iwai <tiwai@suse.de>
> > > ---
> > >  drivers/gpu/drm/drm_fops.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
> > > index ed7bc68f7e87..a82dc28d54f3 100644
> > > --- a/drivers/gpu/drm/drm_fops.c
> > > +++ b/drivers/gpu/drm/drm_fops.c
> > > @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer,
> > >  		if (copy_to_user(buffer + total,
> > >  				 e->event, e->event->length)) {
> > >  			total = -EFAULT;
> > > +			e->destroy(e);
> > 
> > We shouldn't just be throwing away the event here, but put the event
> > back at the front of the queue. Poses an interesting race issue. Seems
> > like we want to hold the spinlock until the copy is complete so that we
> > can fix up the failure correctly.
> 
> Yeah, I thought of it while writing this, but as a starter, I tried
> the simpler one.  (And I didn't realize your comment referring to the
> already existing fix in kernel, sorry!)
> 
> The problem to hold the spinlock for the whole is that you can't it
> with copy_to_user().  So it'd be a bit tricky.

We can use access_ok(VERFIY_WRITE) and then copy_to_user_inatomic().
Doable with a bit of code rearrangement.
 
> And, why not using event_wait.lock instead of the extra event_lock?
> Then we can use wait_event_lock_*() variant that covers more race
> between dequeuing and wait_event.

The tricky part would appear to be then protecting
dev->vblank_event_list. That looks like it could be an RCU list instead?
-Chris

Takashi Iwai Dec. 4, 2014, 12:27 p.m. UTC | #7

At Thu, 4 Dec 2014 12:13:46 +0000,
Chris Wilson wrote:
> 
> On Thu, Dec 04, 2014 at 12:56:38PM +0100, Takashi Iwai wrote:
> > At Thu, 4 Dec 2014 11:51:14 +0000,
> > Chris Wilson wrote:
> > > 
> > > On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote:
> > > > Signed-off-by: Takashi Iwai <tiwai@suse.de>
> > > > ---
> > > >  drivers/gpu/drm/drm_fops.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
> > > > index ed7bc68f7e87..a82dc28d54f3 100644
> > > > --- a/drivers/gpu/drm/drm_fops.c
> > > > +++ b/drivers/gpu/drm/drm_fops.c
> > > > @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer,
> > > >  		if (copy_to_user(buffer + total,
> > > >  				 e->event, e->event->length)) {
> > > >  			total = -EFAULT;
> > > > +			e->destroy(e);
> > > 
> > > We shouldn't just be throwing away the event here, but put the event
> > > back at the front of the queue. Poses an interesting race issue. Seems
> > > like we want to hold the spinlock until the copy is complete so that we
> > > can fix up the failure correctly.
> > 
> > Yeah, I thought of it while writing this, but as a starter, I tried
> > the simpler one.  (And I didn't realize your comment referring to the
> > already existing fix in kernel, sorry!)
> > 
> > The problem to hold the spinlock for the whole is that you can't it
> > with copy_to_user().  So it'd be a bit tricky.
> 
> We can use access_ok(VERFIY_WRITE) and then copy_to_user_inatomic().
> Doable with a bit of code rearrangement.

Yes, but this is really not necessarily atomic.  We can simply prepend
the event back in the error case, instead, too.

> > And, why not using event_wait.lock instead of the extra event_lock?
> > Then we can use wait_event_lock_*() variant that covers more race
> > between dequeuing and wait_event.
> 
> The tricky part would appear to be then protecting
> dev->vblank_event_list. That looks like it could be an RCU list instead?

Sounds interesting.


Takashi

Daniel Vetter Dec. 4, 2014, 12:28 p.m. UTC | #8

On Thu, Dec 04, 2014 at 11:51:14AM +0000, Chris Wilson wrote:
> On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote:
> > Signed-off-by: Takashi Iwai <tiwai@suse.de>
> > ---
> >  drivers/gpu/drm/drm_fops.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
> > index ed7bc68f7e87..a82dc28d54f3 100644
> > --- a/drivers/gpu/drm/drm_fops.c
> > +++ b/drivers/gpu/drm/drm_fops.c
> > @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer,
> >  		if (copy_to_user(buffer + total,
> >  				 e->event, e->event->length)) {
> >  			total = -EFAULT;
> > +			e->destroy(e);
> 
> We shouldn't just be throwing away the event here, but put the event
> back at the front of the queue. Poses an interesting race issue. Seems
> like we want to hold the spinlock until the copy is complete so that we
> can fix up the failure correctly.

I've read the manpage for read and it explicitly states that when you get
an error it's undefined what happens to the read position. Since -EFAULT
is really just a userspace bug I think we can happily drop the event on
the floor, no reason to bend over in the kernel.

I'll add a note to the commit message.
-Daniel

Chris Wilson Dec. 4, 2014, 4:31 p.m. UTC | #9

On Thu, Dec 04, 2014 at 01:28:39PM +0100, Daniel Vetter wrote:
> On Thu, Dec 04, 2014 at 11:51:14AM +0000, Chris Wilson wrote:
> > On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote:
> > > Signed-off-by: Takashi Iwai <tiwai@suse.de>
> > > ---
> > >  drivers/gpu/drm/drm_fops.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
> > > index ed7bc68f7e87..a82dc28d54f3 100644
> > > --- a/drivers/gpu/drm/drm_fops.c
> > > +++ b/drivers/gpu/drm/drm_fops.c
> > > @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer,
> > >  		if (copy_to_user(buffer + total,
> > >  				 e->event, e->event->length)) {
> > >  			total = -EFAULT;
> > > +			e->destroy(e);
> > 
> > We shouldn't just be throwing away the event here, but put the event
> > back at the front of the queue. Poses an interesting race issue. Seems
> > like we want to hold the spinlock until the copy is complete so that we
> > can fix up the failure correctly.
> 
> I've read the manpage for read and it explicitly states that when you get
> an error it's undefined what happens to the read position. Since -EFAULT
> is really just a userspace bug I think we can happily drop the event on
> the floor, no reason to bend over in the kernel.

Hmm. Actually the code is buggy is the provided buffer is too short for
the first event in O_NONBLOCK mode.
-Chris

Daniel Vetter Dec. 4, 2014, 5:25 p.m. UTC | #10

On Thu, Dec 04, 2014 at 04:31:15PM +0000, Chris Wilson wrote:
> On Thu, Dec 04, 2014 at 01:28:39PM +0100, Daniel Vetter wrote:
> > On Thu, Dec 04, 2014 at 11:51:14AM +0000, Chris Wilson wrote:
> > > On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote:
> > > > Signed-off-by: Takashi Iwai <tiwai@suse.de>
> > > > ---
> > > >  drivers/gpu/drm/drm_fops.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
> > > > index ed7bc68f7e87..a82dc28d54f3 100644
> > > > --- a/drivers/gpu/drm/drm_fops.c
> > > > +++ b/drivers/gpu/drm/drm_fops.c
> > > > @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer,
> > > >  		if (copy_to_user(buffer + total,
> > > >  				 e->event, e->event->length)) {
> > > >  			total = -EFAULT;
> > > > +			e->destroy(e);
> > > 
> > > We shouldn't just be throwing away the event here, but put the event
> > > back at the front of the queue. Poses an interesting race issue. Seems
> > > like we want to hold the spinlock until the copy is complete so that we
> > > can fix up the failure correctly.
> > 
> > I've read the manpage for read and it explicitly states that when you get
> > an error it's undefined what happens to the read position. Since -EFAULT
> > is really just a userspace bug I think we can happily drop the event on
> > the floor, no reason to bend over in the kernel.
> 
> Hmm. Actually the code is buggy is the provided buffer is too short for
> the first event in O_NONBLOCK mode.

Well we essentially send out datagrams instead of a bytestream. If we look
at recvmsg and friends then discarding the additional bytes is something
that's already being done. So I'm not terribly concerned about that
either. It's a bit non-pretty that we use read and not reicvmsg but since
we use read already not something we can ever fix.
-Daniel

Chris Wilson Dec. 5, 2014, 8:01 a.m. UTC | #11

On Thu, Dec 04, 2014 at 06:25:08PM +0100, Daniel Vetter wrote:
> On Thu, Dec 04, 2014 at 04:31:15PM +0000, Chris Wilson wrote:
> > On Thu, Dec 04, 2014 at 01:28:39PM +0100, Daniel Vetter wrote:
> > > On Thu, Dec 04, 2014 at 11:51:14AM +0000, Chris Wilson wrote:
> > > > On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote:
> > > > > Signed-off-by: Takashi Iwai <tiwai@suse.de>
> > > > > ---
> > > > >  drivers/gpu/drm/drm_fops.c | 1 +
> > > > >  1 file changed, 1 insertion(+)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c
> > > > > index ed7bc68f7e87..a82dc28d54f3 100644
> > > > > --- a/drivers/gpu/drm/drm_fops.c
> > > > > +++ b/drivers/gpu/drm/drm_fops.c
> > > > > @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer,
> > > > >  		if (copy_to_user(buffer + total,
> > > > >  				 e->event, e->event->length)) {
> > > > >  			total = -EFAULT;
> > > > > +			e->destroy(e);
> > > > 
> > > > We shouldn't just be throwing away the event here, but put the event
> > > > back at the front of the queue. Poses an interesting race issue. Seems
> > > > like we want to hold the spinlock until the copy is complete so that we
> > > > can fix up the failure correctly.
> > > 
> > > I've read the manpage for read and it explicitly states that when you get
> > > an error it's undefined what happens to the read position. Since -EFAULT
> > > is really just a userspace bug I think we can happily drop the event on
> > > the floor, no reason to bend over in the kernel.
> > 
> > Hmm. Actually the code is buggy is the provided buffer is too short for
> > the first event in O_NONBLOCK mode.
> 
> Well we essentially send out datagrams instead of a bytestream. If we look
> at recvmsg and friends then discarding the additional bytes is something
> that's already being done. So I'm not terribly concerned about that
> either. It's a bit non-pretty that we use read and not reicvmsg but since
> we use read already not something we can ever fix.

In all of this consider what the impact of dropping an event is: system
lockup.
-Chris

[1/2] drm: Fix memory leak at error path of drm_read()

Commit Message

Comments

Patch