Message ID | 1417690603-1595-1-git-send-email-tiwai@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote: > Signed-off-by: Takashi Iwai <tiwai@suse.de> Applied to my drm-misc branch. The second patch seems to be a dupe of commit bd008e5b2953186fc0c6633a885ade95e7043800 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Oct 7 14:13:51 2014 +0100 drm: Implement O_NONBLOCK support on /dev/dri/cardN so maybe we should backport that one once it's landed in 3.19? -Daniel > --- > drivers/gpu/drm/drm_fops.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c > index ed7bc68f7e87..a82dc28d54f3 100644 > --- a/drivers/gpu/drm/drm_fops.c > +++ b/drivers/gpu/drm/drm_fops.c > @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer, > if (copy_to_user(buffer + total, > e->event, e->event->length)) { > total = -EFAULT; > + e->destroy(e); > break; > } > > -- > 2.1.3 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel
At Thu, 4 Dec 2014 12:46:16 +0100, Daniel Vetter wrote: > > On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote: > > Signed-off-by: Takashi Iwai <tiwai@suse.de> > > Applied to my drm-misc branch. The second patch seems to be a dupe of > > commit bd008e5b2953186fc0c6633a885ade95e7043800 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Tue Oct 7 14:13:51 2014 +0100 > > drm: Implement O_NONBLOCK support on /dev/dri/cardN > > so maybe we should backport that one once it's landed in 3.19? Ah thanks, I misread Chris' comment as if it were for libdrm, and now understood correctly. Takashi > -Daniel > > > --- > > drivers/gpu/drm/drm_fops.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c > > index ed7bc68f7e87..a82dc28d54f3 100644 > > --- a/drivers/gpu/drm/drm_fops.c > > +++ b/drivers/gpu/drm/drm_fops.c > > @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer, > > if (copy_to_user(buffer + total, > > e->event, e->event->length)) { > > total = -EFAULT; > > + e->destroy(e); > > break; > > } > > > > -- > > 2.1.3 > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/dri-devel > > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch >
On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote: > Signed-off-by: Takashi Iwai <tiwai@suse.de> > --- > drivers/gpu/drm/drm_fops.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c > index ed7bc68f7e87..a82dc28d54f3 100644 > --- a/drivers/gpu/drm/drm_fops.c > +++ b/drivers/gpu/drm/drm_fops.c > @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer, > if (copy_to_user(buffer + total, > e->event, e->event->length)) { > total = -EFAULT; > + e->destroy(e); We shouldn't just be throwing away the event here, but put the event back at the front of the queue. Poses an interesting race issue. Seems like we want to hold the spinlock until the copy is complete so that we can fix up the failure correctly. -Chris
At Thu, 04 Dec 2014 12:50:04 +0100, Takashi Iwai wrote: > > At Thu, 4 Dec 2014 12:46:16 +0100, > Daniel Vetter wrote: > > > > so maybe we should backport that one once it's landed in 3.19? Speaking about this: yes, it's worth, IMO, as this (unexpectedly) seems really fixing the issue on the current driver. thanks, Takashi
At Thu, 4 Dec 2014 11:51:14 +0000, Chris Wilson wrote: > > On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote: > > Signed-off-by: Takashi Iwai <tiwai@suse.de> > > --- > > drivers/gpu/drm/drm_fops.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c > > index ed7bc68f7e87..a82dc28d54f3 100644 > > --- a/drivers/gpu/drm/drm_fops.c > > +++ b/drivers/gpu/drm/drm_fops.c > > @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer, > > if (copy_to_user(buffer + total, > > e->event, e->event->length)) { > > total = -EFAULT; > > + e->destroy(e); > > We shouldn't just be throwing away the event here, but put the event > back at the front of the queue. Poses an interesting race issue. Seems > like we want to hold the spinlock until the copy is complete so that we > can fix up the failure correctly. Yeah, I thought of it while writing this, but as a starter, I tried the simpler one. (And I didn't realize your comment referring to the already existing fix in kernel, sorry!) The problem to hold the spinlock for the whole is that you can't it with copy_to_user(). So it'd be a bit tricky. And, why not using event_wait.lock instead of the extra event_lock? Then we can use wait_event_lock_*() variant that covers more race between dequeuing and wait_event. Takashi
On Thu, Dec 04, 2014 at 12:56:38PM +0100, Takashi Iwai wrote: > At Thu, 4 Dec 2014 11:51:14 +0000, > Chris Wilson wrote: > > > > On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote: > > > Signed-off-by: Takashi Iwai <tiwai@suse.de> > > > --- > > > drivers/gpu/drm/drm_fops.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c > > > index ed7bc68f7e87..a82dc28d54f3 100644 > > > --- a/drivers/gpu/drm/drm_fops.c > > > +++ b/drivers/gpu/drm/drm_fops.c > > > @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer, > > > if (copy_to_user(buffer + total, > > > e->event, e->event->length)) { > > > total = -EFAULT; > > > + e->destroy(e); > > > > We shouldn't just be throwing away the event here, but put the event > > back at the front of the queue. Poses an interesting race issue. Seems > > like we want to hold the spinlock until the copy is complete so that we > > can fix up the failure correctly. > > Yeah, I thought of it while writing this, but as a starter, I tried > the simpler one. (And I didn't realize your comment referring to the > already existing fix in kernel, sorry!) > > The problem to hold the spinlock for the whole is that you can't it > with copy_to_user(). So it'd be a bit tricky. We can use access_ok(VERFIY_WRITE) and then copy_to_user_inatomic(). Doable with a bit of code rearrangement. > And, why not using event_wait.lock instead of the extra event_lock? > Then we can use wait_event_lock_*() variant that covers more race > between dequeuing and wait_event. The tricky part would appear to be then protecting dev->vblank_event_list. That looks like it could be an RCU list instead? -Chris
At Thu, 4 Dec 2014 12:13:46 +0000, Chris Wilson wrote: > > On Thu, Dec 04, 2014 at 12:56:38PM +0100, Takashi Iwai wrote: > > At Thu, 4 Dec 2014 11:51:14 +0000, > > Chris Wilson wrote: > > > > > > On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote: > > > > Signed-off-by: Takashi Iwai <tiwai@suse.de> > > > > --- > > > > drivers/gpu/drm/drm_fops.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c > > > > index ed7bc68f7e87..a82dc28d54f3 100644 > > > > --- a/drivers/gpu/drm/drm_fops.c > > > > +++ b/drivers/gpu/drm/drm_fops.c > > > > @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer, > > > > if (copy_to_user(buffer + total, > > > > e->event, e->event->length)) { > > > > total = -EFAULT; > > > > + e->destroy(e); > > > > > > We shouldn't just be throwing away the event here, but put the event > > > back at the front of the queue. Poses an interesting race issue. Seems > > > like we want to hold the spinlock until the copy is complete so that we > > > can fix up the failure correctly. > > > > Yeah, I thought of it while writing this, but as a starter, I tried > > the simpler one. (And I didn't realize your comment referring to the > > already existing fix in kernel, sorry!) > > > > The problem to hold the spinlock for the whole is that you can't it > > with copy_to_user(). So it'd be a bit tricky. > > We can use access_ok(VERFIY_WRITE) and then copy_to_user_inatomic(). > Doable with a bit of code rearrangement. Yes, but this is really not necessarily atomic. We can simply prepend the event back in the error case, instead, too. > > And, why not using event_wait.lock instead of the extra event_lock? > > Then we can use wait_event_lock_*() variant that covers more race > > between dequeuing and wait_event. > > The tricky part would appear to be then protecting > dev->vblank_event_list. That looks like it could be an RCU list instead? Sounds interesting. Takashi
On Thu, Dec 04, 2014 at 11:51:14AM +0000, Chris Wilson wrote: > On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote: > > Signed-off-by: Takashi Iwai <tiwai@suse.de> > > --- > > drivers/gpu/drm/drm_fops.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c > > index ed7bc68f7e87..a82dc28d54f3 100644 > > --- a/drivers/gpu/drm/drm_fops.c > > +++ b/drivers/gpu/drm/drm_fops.c > > @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer, > > if (copy_to_user(buffer + total, > > e->event, e->event->length)) { > > total = -EFAULT; > > + e->destroy(e); > > We shouldn't just be throwing away the event here, but put the event > back at the front of the queue. Poses an interesting race issue. Seems > like we want to hold the spinlock until the copy is complete so that we > can fix up the failure correctly. I've read the manpage for read and it explicitly states that when you get an error it's undefined what happens to the read position. Since -EFAULT is really just a userspace bug I think we can happily drop the event on the floor, no reason to bend over in the kernel. I'll add a note to the commit message. -Daniel
On Thu, Dec 04, 2014 at 01:28:39PM +0100, Daniel Vetter wrote: > On Thu, Dec 04, 2014 at 11:51:14AM +0000, Chris Wilson wrote: > > On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote: > > > Signed-off-by: Takashi Iwai <tiwai@suse.de> > > > --- > > > drivers/gpu/drm/drm_fops.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c > > > index ed7bc68f7e87..a82dc28d54f3 100644 > > > --- a/drivers/gpu/drm/drm_fops.c > > > +++ b/drivers/gpu/drm/drm_fops.c > > > @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer, > > > if (copy_to_user(buffer + total, > > > e->event, e->event->length)) { > > > total = -EFAULT; > > > + e->destroy(e); > > > > We shouldn't just be throwing away the event here, but put the event > > back at the front of the queue. Poses an interesting race issue. Seems > > like we want to hold the spinlock until the copy is complete so that we > > can fix up the failure correctly. > > I've read the manpage for read and it explicitly states that when you get > an error it's undefined what happens to the read position. Since -EFAULT > is really just a userspace bug I think we can happily drop the event on > the floor, no reason to bend over in the kernel. Hmm. Actually the code is buggy is the provided buffer is too short for the first event in O_NONBLOCK mode. -Chris
On Thu, Dec 04, 2014 at 04:31:15PM +0000, Chris Wilson wrote: > On Thu, Dec 04, 2014 at 01:28:39PM +0100, Daniel Vetter wrote: > > On Thu, Dec 04, 2014 at 11:51:14AM +0000, Chris Wilson wrote: > > > On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote: > > > > Signed-off-by: Takashi Iwai <tiwai@suse.de> > > > > --- > > > > drivers/gpu/drm/drm_fops.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c > > > > index ed7bc68f7e87..a82dc28d54f3 100644 > > > > --- a/drivers/gpu/drm/drm_fops.c > > > > +++ b/drivers/gpu/drm/drm_fops.c > > > > @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer, > > > > if (copy_to_user(buffer + total, > > > > e->event, e->event->length)) { > > > > total = -EFAULT; > > > > + e->destroy(e); > > > > > > We shouldn't just be throwing away the event here, but put the event > > > back at the front of the queue. Poses an interesting race issue. Seems > > > like we want to hold the spinlock until the copy is complete so that we > > > can fix up the failure correctly. > > > > I've read the manpage for read and it explicitly states that when you get > > an error it's undefined what happens to the read position. Since -EFAULT > > is really just a userspace bug I think we can happily drop the event on > > the floor, no reason to bend over in the kernel. > > Hmm. Actually the code is buggy is the provided buffer is too short for > the first event in O_NONBLOCK mode. Well we essentially send out datagrams instead of a bytestream. If we look at recvmsg and friends then discarding the additional bytes is something that's already being done. So I'm not terribly concerned about that either. It's a bit non-pretty that we use read and not reicvmsg but since we use read already not something we can ever fix. -Daniel
On Thu, Dec 04, 2014 at 06:25:08PM +0100, Daniel Vetter wrote: > On Thu, Dec 04, 2014 at 04:31:15PM +0000, Chris Wilson wrote: > > On Thu, Dec 04, 2014 at 01:28:39PM +0100, Daniel Vetter wrote: > > > On Thu, Dec 04, 2014 at 11:51:14AM +0000, Chris Wilson wrote: > > > > On Thu, Dec 04, 2014 at 11:56:42AM +0100, Takashi Iwai wrote: > > > > > Signed-off-by: Takashi Iwai <tiwai@suse.de> > > > > > --- > > > > > drivers/gpu/drm/drm_fops.c | 1 + > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c > > > > > index ed7bc68f7e87..a82dc28d54f3 100644 > > > > > --- a/drivers/gpu/drm/drm_fops.c > > > > > +++ b/drivers/gpu/drm/drm_fops.c > > > > > @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer, > > > > > if (copy_to_user(buffer + total, > > > > > e->event, e->event->length)) { > > > > > total = -EFAULT; > > > > > + e->destroy(e); > > > > > > > > We shouldn't just be throwing away the event here, but put the event > > > > back at the front of the queue. Poses an interesting race issue. Seems > > > > like we want to hold the spinlock until the copy is complete so that we > > > > can fix up the failure correctly. > > > > > > I've read the manpage for read and it explicitly states that when you get > > > an error it's undefined what happens to the read position. Since -EFAULT > > > is really just a userspace bug I think we can happily drop the event on > > > the floor, no reason to bend over in the kernel. > > > > Hmm. Actually the code is buggy is the provided buffer is too short for > > the first event in O_NONBLOCK mode. > > Well we essentially send out datagrams instead of a bytestream. If we look > at recvmsg and friends then discarding the additional bytes is something > that's already being done. So I'm not terribly concerned about that > either. It's a bit non-pretty that we use read and not reicvmsg but since > we use read already not something we can ever fix. In all of this consider what the impact of dropping an event is: system lockup. -Chris
diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c index ed7bc68f7e87..a82dc28d54f3 100644 --- a/drivers/gpu/drm/drm_fops.c +++ b/drivers/gpu/drm/drm_fops.c @@ -525,6 +525,7 @@ ssize_t drm_read(struct file *filp, char __user *buffer, if (copy_to_user(buffer + total, e->event, e->event->length)) { total = -EFAULT; + e->destroy(e); break; }
Signed-off-by: Takashi Iwai <tiwai@suse.de> --- drivers/gpu/drm/drm_fops.c | 1 + 1 file changed, 1 insertion(+)