Message ID | 149866562059.23475.15965626912972737879@mail.alporthouse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jun 28, 2017 at 05:00:20PM +0100, Chris Wilson wrote: > Quoting Sean Paul (2017-06-28 16:51:11) > > Protect against long-running processes from overflowing the timeline > > and creating fences that go back in time. While we're at it, avoid > > overflowing while we're incrementing the timeline. > > > > Signed-off-by: Sean Paul <seanpaul@chromium.org> > > --- > > drivers/dma-buf/sw_sync.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c > > index 69c5ff36e2f9..40934619ed88 100644 > > --- a/drivers/dma-buf/sw_sync.c > > +++ b/drivers/dma-buf/sw_sync.c > > @@ -142,7 +142,7 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc) > > > > spin_lock_irqsave(&obj->child_list_lock, flags); > > > > - obj->value += inc; > > + obj->value += min(inc, ~0x0U - obj->value); > > The timeline uses u32 seqno, so just obj->value += min(inc, INT_MAX); > Hi Chris, Thanks for the review. I don't think that solves the same problem I was trying to solve. The issue is that android userspace increments value by 0x7fffffff twice in order to ensure all fences have signaled. This is causing value to overflow and is_signaled will never be true. With your snippet, the possibility of overflow still exists. > Better of course would be to report the error, AFAIK, it's not an error to jump the timeline, perhaps just bad taste. Capping value at UINT_MAX will ensure all fences are signaled, and the check below ensures that fences can't be created beyond that (returning an error at that point in time). Sean > > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c > index 69c5ff36e2f9..2503cf884018 100644 > --- a/drivers/dma-buf/sw_sync.c > +++ b/drivers/dma-buf/sw_sync.c > @@ -345,6 +345,9 @@ static long sw_sync_ioctl_inc(struct sync_timeline *obj, unsigned long arg) > if (copy_from_user(&value, (void __user *)arg, sizeof(value))) > return -EFAULT; > > + if (value > INT_MAX) > + return -EINVAL; > + > sync_timeline_signal(obj, value); > > > > > list_for_each_entry_safe(pt, next, &obj->active_list_head, > > active_list) { > > @@ -178,6 +178,11 @@ static struct sync_pt *sync_pt_create(struct sync_timeline *obj, int size, > > return NULL; > > > > spin_lock_irqsave(&obj->child_list_lock, flags); > > + if (value < obj->value) { > > + spin_unlock_irqrestore(&obj->child_list_lock, flags); > > + return NULL; > > + } > > Needs a u32 check. if ((int)(value - obj->value) < 0) return some_error; > -Chris
Quoting Sean Paul (2017-06-28 17:47:24) > On Wed, Jun 28, 2017 at 05:00:20PM +0100, Chris Wilson wrote: > > Quoting Sean Paul (2017-06-28 16:51:11) > > > Protect against long-running processes from overflowing the timeline > > > and creating fences that go back in time. While we're at it, avoid > > > overflowing while we're incrementing the timeline. > > > > > > Signed-off-by: Sean Paul <seanpaul@chromium.org> > > > --- > > > drivers/dma-buf/sw_sync.c | 7 ++++++- > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c > > > index 69c5ff36e2f9..40934619ed88 100644 > > > --- a/drivers/dma-buf/sw_sync.c > > > +++ b/drivers/dma-buf/sw_sync.c > > > @@ -142,7 +142,7 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc) > > > > > > spin_lock_irqsave(&obj->child_list_lock, flags); > > > > > > - obj->value += inc; > > > + obj->value += min(inc, ~0x0U - obj->value); > > > > The timeline uses u32 seqno, so just obj->value += min(inc, INT_MAX); > > > Hi Chris, > Thanks for the review. > > I don't think that solves the same problem I was trying to solve. The issue is > that android userspace increments value by 0x7fffffff twice in order to ensure > all fences have signaled. This is causing value to overflow and is_signaled will > never be true. With your snippet, the possibility of overflow still exists. > > > Better of course would be to report the error, > > AFAIK, it's not an error to jump the timeline, perhaps just bad taste. Capping > value at UINT_MAX will ensure all fences are signaled, and the check below ensures > that fences can't be created beyond that (returning an error at that point in > time). UINT_MAX doesn't imply all fences will be signaled either, the timeline is supposed to wrap. The issue is timeline_fence_signaled() is using the wrong test, it should be return (int)(fence->seqno - parent->value) <= 0; If it helps extract a little helper from dma_fence_is_later(). -Chris
On Wed, Jun 28, 2017 at 08:45:55PM +0100, Chris Wilson wrote: > Quoting Sean Paul (2017-06-28 17:47:24) > > On Wed, Jun 28, 2017 at 05:00:20PM +0100, Chris Wilson wrote: > > > Quoting Sean Paul (2017-06-28 16:51:11) > > > > Protect against long-running processes from overflowing the timeline > > > > and creating fences that go back in time. While we're at it, avoid > > > > overflowing while we're incrementing the timeline. > > > > > > > > Signed-off-by: Sean Paul <seanpaul@chromium.org> > > > > --- > > > > drivers/dma-buf/sw_sync.c | 7 ++++++- > > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c > > > > index 69c5ff36e2f9..40934619ed88 100644 > > > > --- a/drivers/dma-buf/sw_sync.c > > > > +++ b/drivers/dma-buf/sw_sync.c > > > > @@ -142,7 +142,7 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc) > > > > > > > > spin_lock_irqsave(&obj->child_list_lock, flags); > > > > > > > > - obj->value += inc; > > > > + obj->value += min(inc, ~0x0U - obj->value); > > > > > > The timeline uses u32 seqno, so just obj->value += min(inc, INT_MAX); > > > > > Hi Chris, > > Thanks for the review. > > > > I don't think that solves the same problem I was trying to solve. The issue is > > that android userspace increments value by 0x7fffffff twice in order to ensure > > all fences have signaled. This is causing value to overflow and is_signaled will > > never be true. With your snippet, the possibility of overflow still exists. > > > > > Better of course would be to report the error, > > > > AFAIK, it's not an error to jump the timeline, perhaps just bad taste. Capping > > value at UINT_MAX will ensure all fences are signaled, and the check below ensures > > that fences can't be created beyond that (returning an error at that point in > > time). > > UINT_MAX doesn't imply all fences will be signaled either, the timeline > is supposed to wrap. > > The issue is timeline_fence_signaled() is using the wrong test, it > should be return (int)(fence->seqno - parent->value) <= 0; If it helps > extract a little helper from dma_fence_is_later(). Understood, thank you for clarifying. This still doesn't solve the issue of userspace jumping the timeline by INT_MAX multiple times. In that case, value will rollover and even the new signaled() will fail to report. Sean > -Chris
I think the kernel has problems with Android fences which were slowly broken as they were de-staged: 1. They allowed for fence/timeline value/seqno to overflow/rollover and that would break only if delta between timeline and earlier unsignaled fence exceeded 2^31 or so. Android relies on that behavior. It was also possible to clean-up timeline by incrementing it for instance by calling timeline_inc(0x7FFFFFFF) twice. This doesn't work anymore because if no one is actively waiting for fence, active_list is empty and timeline_inc does not signal fences anymore; just increments timeline value and allows it to rollover without any effect. 2. They did guarantee that when timeline was destroyed all fences on timeline were signaled (or error state). Thus, if timeline was destroyed, or if process that owned the timeline died, processes that depended on it would not hang. Android also relies on that behavior for cleanup when a process crashes for example. 3. It seems from some stack traces that I have seen that timeline_inc signals fence from inside spinlock, which can cause fence_array to call fence_array_release, which calls flush_work() e.g. <4>[ 140.762030] [<ffffffff858939fb>] dump_stack+0x4d/0x63 <4>[ 140.762035] [<ffffffff8568b593>] ___might_sleep+0x149/0x14e <4>[ 140.762039] [<ffffffff8568b637>] __might_sleep+0x9f/0xa6 <4>[ 140.762045] [<ffffffff8567ef2a>] flush_work+0x39/0x19a <4>[ 140.762049] [<ffffffff8568eb5c>] ? try_to_wake_up+0x20b/0x21b <4>[ 140.762055] [<ffffffff85a54cea>] fence_array_release+0x2e/0x63 <4>[ 140.762058] [<ffffffff85a53a65>] fence_release+0x82/0x8e <4>[ 140.762061] [<ffffffff85a54cba>] fence_put+0x15/0x17 <4>[ 140.762065] [<ffffffff85a54e08>] fence_array_cb_func+0x1f/0x39 <4>[ 140.762068] [<ffffffff85a53881>] fence_signal_locked+0x8e/0xa3 <4>[ 140.762072] [<ffffffff85a55cda>] sync_timeline_signal+0xcd/0x10a <4>[ 140.762075] [<ffffffff85a5613b>] sw_sync_ioctl+0x159/0x17f On Wed, Jun 28, 2017 at 2:00 PM, Sean Paul <seanpaul@chromium.org> wrote: > On Wed, Jun 28, 2017 at 08:45:55PM +0100, Chris Wilson wrote: > > Quoting Sean Paul (2017-06-28 17:47:24) > > > On Wed, Jun 28, 2017 at 05:00:20PM +0100, Chris Wilson wrote: > > > > Quoting Sean Paul (2017-06-28 16:51:11) > > > > > Protect against long-running processes from overflowing the > timeline > > > > > and creating fences that go back in time. While we're at it, avoid > > > > > overflowing while we're incrementing the timeline. > > > > > > > > > > Signed-off-by: Sean Paul <seanpaul@chromium.org> > > > > > --- > > > > > drivers/dma-buf/sw_sync.c | 7 ++++++- > > > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c > > > > > index 69c5ff36e2f9..40934619ed88 100644 > > > > > --- a/drivers/dma-buf/sw_sync.c > > > > > +++ b/drivers/dma-buf/sw_sync.c > > > > > @@ -142,7 +142,7 @@ static void sync_timeline_signal(struct > sync_timeline *obj, unsigned int inc) > > > > > > > > > > spin_lock_irqsave(&obj->child_list_lock, flags); > > > > > > > > > > - obj->value += inc; > > > > > + obj->value += min(inc, ~0x0U - obj->value); > > > > > > > > The timeline uses u32 seqno, so just obj->value += min(inc, INT_MAX); > > > > > > > Hi Chris, > > > Thanks for the review. > > > > > > I don't think that solves the same problem I was trying to solve. The > issue is > > > that android userspace increments value by 0x7fffffff twice in order > to ensure > > > all fences have signaled. This is causing value to overflow and > is_signaled will > > > never be true. With your snippet, the possibility of overflow still > exists. > > > > > > > Better of course would be to report the error, > > > > > > AFAIK, it's not an error to jump the timeline, perhaps just bad taste. > Capping > > > value at UINT_MAX will ensure all fences are signaled, and the check > below ensures > > > that fences can't be created beyond that (returning an error at that > point in > > > time). > > > > UINT_MAX doesn't imply all fences will be signaled either, the timeline > > is supposed to wrap. > > > > The issue is timeline_fence_signaled() is using the wrong test, it > > should be return (int)(fence->seqno - parent->value) <= 0; If it helps > > extract a little helper from dma_fence_is_later(). > > Understood, thank you for clarifying. This still doesn't solve the issue > of userspace > jumping the timeline by INT_MAX multiple times. In that case, value will > rollover and > even the new signaled() will fail to report. > > Sean > > > -Chris > > -- > Sean Paul, Software Engineer, Google / Chromium OS > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel >
Quoting Sean Paul (2017-06-28 22:00:02) > On Wed, Jun 28, 2017 at 08:45:55PM +0100, Chris Wilson wrote: > > Quoting Sean Paul (2017-06-28 17:47:24) > > > On Wed, Jun 28, 2017 at 05:00:20PM +0100, Chris Wilson wrote: > > > > Quoting Sean Paul (2017-06-28 16:51:11) > > > > > Protect against long-running processes from overflowing the timeline > > > > > and creating fences that go back in time. While we're at it, avoid > > > > > overflowing while we're incrementing the timeline. > > > > > > > > > > Signed-off-by: Sean Paul <seanpaul@chromium.org> > > > > > --- > > > > > drivers/dma-buf/sw_sync.c | 7 ++++++- > > > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c > > > > > index 69c5ff36e2f9..40934619ed88 100644 > > > > > --- a/drivers/dma-buf/sw_sync.c > > > > > +++ b/drivers/dma-buf/sw_sync.c > > > > > @@ -142,7 +142,7 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc) > > > > > > > > > > spin_lock_irqsave(&obj->child_list_lock, flags); > > > > > > > > > > - obj->value += inc; > > > > > + obj->value += min(inc, ~0x0U - obj->value); > > > > > > > > The timeline uses u32 seqno, so just obj->value += min(inc, INT_MAX); > > > > > > > Hi Chris, > > > Thanks for the review. > > > > > > I don't think that solves the same problem I was trying to solve. The issue is > > > that android userspace increments value by 0x7fffffff twice in order to ensure > > > all fences have signaled. This is causing value to overflow and is_signaled will > > > never be true. With your snippet, the possibility of overflow still exists. > > > > > > > Better of course would be to report the error, > > > > > > AFAIK, it's not an error to jump the timeline, perhaps just bad taste. Capping > > > value at UINT_MAX will ensure all fences are signaled, and the check below ensures > > > that fences can't be created beyond that (returning an error at that point in > > > time). > > > > UINT_MAX doesn't imply all fences will be signaled either, the timeline > > is supposed to wrap. > > > > The issue is timeline_fence_signaled() is using the wrong test, it > > should be return (int)(fence->seqno - parent->value) <= 0; If it helps > > extract a little helper from dma_fence_is_later(). > > Understood, thank you for clarifying. This still doesn't solve the issue of userspace > jumping the timeline by INT_MAX multiple times. In that case, value will rollover and > even the new signaled() will fail to report. Right, all the intermediate fences need to be signaled as the timeline jumps. That seems to be handled by walking the active_list. However, that is broken because not all fences are on the active_list and the active_list itself is not updated atomically! -Chris
diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c index 69c5ff36e2f9..2503cf884018 100644 --- a/drivers/dma-buf/sw_sync.c +++ b/drivers/dma-buf/sw_sync.c @@ -345,6 +345,9 @@ static long sw_sync_ioctl_inc(struct sync_timeline *obj, unsigned long arg) if (copy_from_user(&value, (void __user *)arg, sizeof(value))) return -EFAULT; + if (value > INT_MAX) + return -EINVAL; + sync_timeline_signal(obj, value); >