Message ID | 1429798078-18987-2-git-send-email-peter.antoine@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Apr 23, 2015 at 03:07:54PM +0100, Peter Antoine wrote: > This patch fixes a possible kernel crash when drm_unlock (DRM_IOCTL_UNLOCK) > is called by a application that has not had a lock created by it. This > crash can be caused by any application from all users. Crashing the application is still a crash... -Chris
Before the patch the system required rebooting (driver crash and/or kernel panic). Now the application gets terminated. This follows the pattern in the other parts of this code that checks that pointer. Peter. -----Original Message----- From: Chris Wilson [mailto:chris@chris-wilson.co.uk] Sent: Thursday, April 23, 2015 3:20 PM To: Antoine, Peter Cc: intel-gfx@lists.freedesktop.org; airlied@redhat.com; dri-devel@lists.freedesktop.org; daniel.vetter@ffwll.ch Subject: Re: [Intel-gfx] [PATCH 1/5] drm: Kernel Crash in drm_unlock On Thu, Apr 23, 2015 at 03:07:54PM +0100, Peter Antoine wrote: > This patch fixes a possible kernel crash when drm_unlock > (DRM_IOCTL_UNLOCK) is called by a application that has not had a lock > created by it. This crash can be caused by any application from all users. Crashing the application is still a crash... -Chris -- Chris Wilson, Intel Open Source Technology Centre
On Thu, Apr 23, 2015 at 02:34:24PM +0000, Antoine, Peter wrote: > Before the patch the system required rebooting (driver crash and/or kernel panic). > Now the application gets terminated. It should have an GPF which should not have required a reboot, but terminated the application. Unless you set it to automatically reboot. -Chris
I picked up this work due to the following Jira ticket created by the security team (on Android) and was asked to give it a second look and found a few more issues with the hw lock code. https://jira01.devtools.intel.com/browse/GMINL-5388 I/O control on /dev/dri/card0 crashes the kernel (0x4008642b) It also stops Linux as it kills the driver, I guess it might be possible to reload the gfx driver. On a unpatched system the test that is included in the issue or the igt test that has been posted for the issue will show the problem. I ran the test on an unpatched system here and the gui stopped and the keyboard stopped responding, so I rebooted. With the patched system I did not need to reboot. Should I change the SIGTERM to SIGSEGV, not quite the same thing but tooling is better at handling a segfault than a SIGTERM and the application that calls this IOCTL is using an uninitialised hw lock so it is kind of the same as differencing an uninitialised pointer (kind of). Or, I could just remove it, but the bug has been in the code for at least two years (and known about), and I would guess that any code that is calling this is fuzzing the IOCTLs (as this is how the security team found it) and we should reward them with a application exit. Peter. On Thu, 2015-04-23 at 15:39 +0100, Chris Wilson wrote: > On Thu, Apr 23, 2015 at 02:34:24PM +0000, Antoine, Peter wrote: > > Before the patch the system required rebooting (driver crash and/or kernel panic). > > Now the application gets terminated. > > It should have an GPF which should not have required a reboot, but > terminated the application. Unless you set it to automatically reboot. > -Chris >
On 24/04/15 06:52, Antoine, Peter wrote: > I picked up this work due to the following Jira ticket created by the > security team (on Android) and was asked to give it a second look and > found a few more issues with the hw lock code. > > https://jira01.devtools.intel.com/browse/GMINL-5388 > I/O control on /dev/dri/card0 crashes the kernel (0x4008642b) > > It also stops Linux as it kills the driver, I guess it might be possible > to reload the gfx driver. On a unpatched system the test that is > included in the issue or the igt test that has been posted for the issue > will show the problem. > > I ran the test on an unpatched system here and the gui stopped and the > keyboard stopped responding, so I rebooted. With the patched system I > did not need to reboot. > > Should I change the SIGTERM to SIGSEGV, not quite the same thing but > tooling is better at handling a segfault than a SIGTERM and the > application that calls this IOCTL is using an uninitialised hw lock so > it is kind of the same as differencing an uninitialised pointer (kind > of). Or, I could just remove it, but the bug has been in the code for at > least two years (and known about), and I would guess that any code that > is calling this is fuzzing the IOCTLs (as this is how the security team > found it) and we should reward them with a application exit. > > Peter. SIGSEGV would be a better choice. SIGTERM is normally sent by a user -- it's the default signal sent by kill(1). It's also commonly used to tell a long-running daemon process to tidy up and exit cleanly. SIGSEGV commonly means "you accessed something that doesn't exist/isn't mapped/you don't have permissions for". There are specific subcases that can be indicated via the siginfo data; this is from the sigaction(1) manpage: The following values can be placed in si_code for a SIGSEGV signal: SEGV_MAPERR address not mapped to object SEGV_ACCERR invalid permissions for mapped object SIGBUS would also be a possibility but that's generally taken to mean that an access got all the way to some physical bus and then faulted, whereas SIGSEGV suggests the access was rejected during the virtual-to-physical mapping process. .Dave.
On Tue, Apr 28, 2015 at 10:21:49AM +0100, Dave Gordon wrote: > On 24/04/15 06:52, Antoine, Peter wrote: > > I picked up this work due to the following Jira ticket created by the > > security team (on Android) and was asked to give it a second look and > > found a few more issues with the hw lock code. > > > > https://jira01.devtools.intel.com/browse/GMINL-5388 > > I/O control on /dev/dri/card0 crashes the kernel (0x4008642b) > > > > It also stops Linux as it kills the driver, I guess it might be possible > > to reload the gfx driver. On a unpatched system the test that is > > included in the issue or the igt test that has been posted for the issue > > will show the problem. > > > > I ran the test on an unpatched system here and the gui stopped and the > > keyboard stopped responding, so I rebooted. With the patched system I > > did not need to reboot. > > > > Should I change the SIGTERM to SIGSEGV, not quite the same thing but > > tooling is better at handling a segfault than a SIGTERM and the > > application that calls this IOCTL is using an uninitialised hw lock so > > it is kind of the same as differencing an uninitialised pointer (kind > > of). Or, I could just remove it, but the bug has been in the code for at > > least two years (and known about), and I would guess that any code that > > is calling this is fuzzing the IOCTLs (as this is how the security team > > found it) and we should reward them with a application exit. > > > > Peter. > > SIGSEGV would be a better choice. > > SIGTERM is normally sent by a user -- it's the default signal sent by > kill(1). It's also commonly used to tell a long-running daemon process > to tidy up and exit cleanly. > > SIGSEGV commonly means "you accessed something that doesn't exist/isn't > mapped/you don't have permissions for". There are specific subcases that > can be indicated via the siginfo data; this is from the sigaction(1) > manpage: > > The following values can be placed in si_code for a SIGSEGV signal: > > SEGV_MAPERR address not mapped to object > > SEGV_ACCERR invalid permissions for mapped object > > SIGBUS would also be a possibility but that's generally taken to mean > that an access got all the way to some physical bus and then faulted, > whereas SIGSEGV suggests the access was rejected during the > virtual-to-physical mapping process. None of the above. Just return -EINVAL, -EPERM, -EACCESS as appropriate. -Chris
On 28/04/15 10:21, Dave Gordon wrote: > On 24/04/15 06:52, Antoine, Peter wrote: >> I picked up this work due to the following Jira ticket created by the >> security team (on Android) and was asked to give it a second look and >> found a few more issues with the hw lock code. >> >> https://jira01.devtools.intel.com/browse/GMINL-5388 >> I/O control on /dev/dri/card0 crashes the kernel (0x4008642b) >> >> It also stops Linux as it kills the driver, I guess it might be possible >> to reload the gfx driver. On a unpatched system the test that is >> included in the issue or the igt test that has been posted for the issue >> will show the problem. >> >> I ran the test on an unpatched system here and the gui stopped and the >> keyboard stopped responding, so I rebooted. With the patched system I >> did not need to reboot. >> >> Should I change the SIGTERM to SIGSEGV, not quite the same thing but >> tooling is better at handling a segfault than a SIGTERM and the >> application that calls this IOCTL is using an uninitialised hw lock so >> it is kind of the same as differencing an uninitialised pointer (kind >> of). Or, I could just remove it, but the bug has been in the code for at >> least two years (and known about), and I would guess that any code that >> is calling this is fuzzing the IOCTLs (as this is how the security team >> found it) and we should reward them with a application exit. >> >> Peter. > > SIGSEGV would be a better choice. [snip] Nope, I've changed my mind about this. I thought the problematic case was just a process releasing the lock without having acquired it, but on further examination it's really that the DRM master process has gone away or otherwise deleted (or never created?) the lock. And THEN the (non-master?) process tries to release the (now) non-existent lock. But more importantly, there's existing code in the acquire-lock path that sends SIGTERM for this case, so obviously the release-lock code should be as similar as possible. .Dave.
On Tue, Apr 28, 2015 at 10:52:32AM +0100, chris@chris-wilson.co.uk wrote: > On Tue, Apr 28, 2015 at 10:21:49AM +0100, Dave Gordon wrote: > > On 24/04/15 06:52, Antoine, Peter wrote: > > > I picked up this work due to the following Jira ticket created by the > > > security team (on Android) and was asked to give it a second look and > > > found a few more issues with the hw lock code. > > > > > > https://jira01.devtools.intel.com/browse/GMINL-5388 > > > I/O control on /dev/dri/card0 crashes the kernel (0x4008642b) > > > > > > It also stops Linux as it kills the driver, I guess it might be possible > > > to reload the gfx driver. On a unpatched system the test that is > > > included in the issue or the igt test that has been posted for the issue > > > will show the problem. > > > > > > I ran the test on an unpatched system here and the gui stopped and the > > > keyboard stopped responding, so I rebooted. With the patched system I > > > did not need to reboot. > > > > > > Should I change the SIGTERM to SIGSEGV, not quite the same thing but > > > tooling is better at handling a segfault than a SIGTERM and the > > > application that calls this IOCTL is using an uninitialised hw lock so > > > it is kind of the same as differencing an uninitialised pointer (kind > > > of). Or, I could just remove it, but the bug has been in the code for at > > > least two years (and known about), and I would guess that any code that > > > is calling this is fuzzing the IOCTLs (as this is how the security team > > > found it) and we should reward them with a application exit. > > > > > > Peter. > > > > SIGSEGV would be a better choice. > > > > SIGTERM is normally sent by a user -- it's the default signal sent by > > kill(1). It's also commonly used to tell a long-running daemon process > > to tidy up and exit cleanly. > > > > SIGSEGV commonly means "you accessed something that doesn't exist/isn't > > mapped/you don't have permissions for". There are specific subcases that > > can be indicated via the siginfo data; this is from the sigaction(1) > > manpage: > > > > The following values can be placed in si_code for a SIGSEGV signal: > > > > SEGV_MAPERR address not mapped to object > > > > SEGV_ACCERR invalid permissions for mapped object > > > > SIGBUS would also be a possibility but that's generally taken to mean > > that an access got all the way to some physical bus and then faulted, > > whereas SIGSEGV suggests the access was rejected during the > > virtual-to-physical mapping process. > > None of the above. Just return -EINVAL, -EPERM, -EACCESS as appropriate. Seconded, we really don't want to be in the business of fixing up the drm design mistakes of the past 15 years. As long as we can fully lock out this particular dragon when running i915 we're imo good enough. The dri1 design of a kernel shim driver cooperating with the ums driver for hw ownership is fundamentally unfixable. Also we can't change any of it for drivers actually using it since it'll break them, which is a big no-go. -Daniel
On Mon, 2015-05-04 at 15:52 +0200, Daniel Vetter wrote: > On Tue, Apr 28, 2015 at 10:52:32AM +0100, chris@chris-wilson.co.uk wrote: > > On Tue, Apr 28, 2015 at 10:21:49AM +0100, Dave Gordon wrote: > > > On 24/04/15 06:52, Antoine, Peter wrote: > > > > I picked up this work due to the following Jira ticket created by the > > > > security team (on Android) and was asked to give it a second look and > > > > found a few more issues with the hw lock code. > > > > > > > > https://jira01.devtools.intel.com/browse/GMINL-5388 > > > > I/O control on /dev/dri/card0 crashes the kernel (0x4008642b) > > > > > > > > It also stops Linux as it kills the driver, I guess it might be possible > > > > to reload the gfx driver. On a unpatched system the test that is > > > > included in the issue or the igt test that has been posted for the issue > > > > will show the problem. > > > > > > > > I ran the test on an unpatched system here and the gui stopped and the > > > > keyboard stopped responding, so I rebooted. With the patched system I > > > > did not need to reboot. > > > > > > > > Should I change the SIGTERM to SIGSEGV, not quite the same thing but > > > > tooling is better at handling a segfault than a SIGTERM and the > > > > application that calls this IOCTL is using an uninitialised hw lock so > > > > it is kind of the same as differencing an uninitialised pointer (kind > > > > of). Or, I could just remove it, but the bug has been in the code for at > > > > least two years (and known about), and I would guess that any code that > > > > is calling this is fuzzing the IOCTLs (as this is how the security team > > > > found it) and we should reward them with a application exit. > > > > > > > > Peter. > > > > > > SIGSEGV would be a better choice. > > > > > > SIGTERM is normally sent by a user -- it's the default signal sent by > > > kill(1). It's also commonly used to tell a long-running daemon process > > > to tidy up and exit cleanly. > > > > > > SIGSEGV commonly means "you accessed something that doesn't exist/isn't > > > mapped/you don't have permissions for". There are specific subcases that > > > can be indicated via the siginfo data; this is from the sigaction(1) > > > manpage: > > > > > > The following values can be placed in si_code for a SIGSEGV signal: > > > > > > SEGV_MAPERR address not mapped to object > > > > > > SEGV_ACCERR invalid permissions for mapped object > > > > > > SIGBUS would also be a possibility but that's generally taken to mean > > > that an access got all the way to some physical bus and then faulted, > > > whereas SIGSEGV suggests the access was rejected during the > > > virtual-to-physical mapping process. > > > > None of the above. Just return -EINVAL, -EPERM, -EACCESS as appropriate. > > Seconded, we really don't want to be in the business of fixing up the drm > design mistakes of the past 15 years. As long as we can fully lock out > this particular dragon when running i915 we're imo good enough. The dri1 > design of a kernel shim driver cooperating with the ums driver for hw > ownership is fundamentally unfixable. > > Also we can't change any of it for drivers actually using it since it'll > break them, which is a big no-go. > -Daniel I will remove it. But, If you are using this code path the driver/kernel will have crashed. It covers a NULL pointer deference, so we are not changing the API that anyone is actually using. Peter.
On Tue, May 05, 2015 at 06:37:51AM +0000, Antoine, Peter wrote: > On Mon, 2015-05-04 at 15:52 +0200, Daniel Vetter wrote: > > On Tue, Apr 28, 2015 at 10:52:32AM +0100, chris@chris-wilson.co.uk wrote: > > > On Tue, Apr 28, 2015 at 10:21:49AM +0100, Dave Gordon wrote: > > > > On 24/04/15 06:52, Antoine, Peter wrote: > > > > > I picked up this work due to the following Jira ticket created by the > > > > > security team (on Android) and was asked to give it a second look and > > > > > found a few more issues with the hw lock code. > > > > > > > > > > https://jira01.devtools.intel.com/browse/GMINL-5388 > > > > > I/O control on /dev/dri/card0 crashes the kernel (0x4008642b) > > > > > > > > > > It also stops Linux as it kills the driver, I guess it might be possible > > > > > to reload the gfx driver. On a unpatched system the test that is > > > > > included in the issue or the igt test that has been posted for the issue > > > > > will show the problem. > > > > > > > > > > I ran the test on an unpatched system here and the gui stopped and the > > > > > keyboard stopped responding, so I rebooted. With the patched system I > > > > > did not need to reboot. > > > > > > > > > > Should I change the SIGTERM to SIGSEGV, not quite the same thing but > > > > > tooling is better at handling a segfault than a SIGTERM and the > > > > > application that calls this IOCTL is using an uninitialised hw lock so > > > > > it is kind of the same as differencing an uninitialised pointer (kind > > > > > of). Or, I could just remove it, but the bug has been in the code for at > > > > > least two years (and known about), and I would guess that any code that > > > > > is calling this is fuzzing the IOCTLs (as this is how the security team > > > > > found it) and we should reward them with a application exit. > > > > > > > > > > Peter. > > > > > > > > SIGSEGV would be a better choice. > > > > > > > > SIGTERM is normally sent by a user -- it's the default signal sent by > > > > kill(1). It's also commonly used to tell a long-running daemon process > > > > to tidy up and exit cleanly. > > > > > > > > SIGSEGV commonly means "you accessed something that doesn't exist/isn't > > > > mapped/you don't have permissions for". There are specific subcases that > > > > can be indicated via the siginfo data; this is from the sigaction(1) > > > > manpage: > > > > > > > > The following values can be placed in si_code for a SIGSEGV signal: > > > > > > > > SEGV_MAPERR address not mapped to object > > > > > > > > SEGV_ACCERR invalid permissions for mapped object > > > > > > > > SIGBUS would also be a possibility but that's generally taken to mean > > > > that an access got all the way to some physical bus and then faulted, > > > > whereas SIGSEGV suggests the access was rejected during the > > > > virtual-to-physical mapping process. > > > > > > None of the above. Just return -EINVAL, -EPERM, -EACCESS as appropriate. > > > > Seconded, we really don't want to be in the business of fixing up the drm > > design mistakes of the past 15 years. As long as we can fully lock out > > this particular dragon when running i915 we're imo good enough. The dri1 > > design of a kernel shim driver cooperating with the ums driver for hw > > ownership is fundamentally unfixable. > > > > Also we can't change any of it for drivers actually using it since it'll > > break them, which is a big no-go. > > -Daniel > > I will remove it. But, If you are using this code path the driver/kernel > will have crashed. It covers a NULL pointer deference, so we are not > changing the API that anyone is actually using. With ums/dri1 userspace can crash the kernel. That's pretty much by design, and it's pretty much unfixable. Trying to fix up the obvious ones like here is just cargo-culting. The only real option is to make absolutely sure we can't ever reach these codepaths at all from kms drivers like i915. Which tends to be a lot more work for auditing, but will be much more useful. -Daniel
diff --git a/drivers/gpu/drm/drm_lock.c b/drivers/gpu/drm/drm_lock.c index f861361..070dd5d 100644 --- a/drivers/gpu/drm/drm_lock.c +++ b/drivers/gpu/drm/drm_lock.c @@ -159,6 +159,14 @@ int drm_legacy_unlock(struct drm_device *dev, void *data, struct drm_file *file_ return -EINVAL; } + if (!master->lock.hw_lock) { + DRM_ERROR( + "Device has been unregistered. Hard exit. Process %d\n", + task_pid_nr(current)); + send_sig(SIGTERM, current, 0); + return -EPERM; + } + if (drm_legacy_lock_free(&master->lock, lock->context)) { /* FIXME: Should really bail out here. */ }
This patch fixes a possible kernel crash when drm_unlock (DRM_IOCTL_UNLOCK) is called by a application that has not had a lock created by it. This crash can be caused by any application from all users. Issue: VIZ-5485 Signed-off-by: Peter Antoine <peter.antoine@intel.com> --- drivers/gpu/drm/drm_lock.c | 8 ++++++++ 1 file changed, 8 insertions(+)